Streaming Data Tutorial

By setting up a new replication job, Hydra will automatically populate data from the ingestion sink to the replication sink.

To tell Hydra how to configure a replication job, you can use a special JSON file in the domain-specific-language (DSL).


Find the data you want to replicate

Go to `{{host}}:{{port}}/schemas`. Find the topic name for the data you want to replicate (e.g., `exp.plans.Plan`). If you go to `/schemas/{topicName}`, the `hydra.key` listed is going to be the primary key in the table that gets created.


POST a test DSL Payload to Hydra

Using Postman, send a test POST to http://{{host}}:{{port}}/dsl (JSON body, no headers needed) in the following format:

{
	"replicate": {
	"name": "TEST - Replication of Plan data",
	"topics": ["exp.plans.Plan"],
	"startingOffsets": "earliest",
	"applicationId": "roleiq_exp.plans.PlanTest0",
	"primaryKeys": {
		"exp.plans.Plan": "id"
		},
	"connection": {
		"url": "jdbc:postgresql://localhost:5432/postgres?user=USER&password=PASS"
		}
	}
}


Notes:

  • This will replicate the data into `postgres` database rather than the `roleiq` database
  • You can post multiple topics at once by listing them in the `topics` array.
  • `applicationId` has to be unique each time
  • The primary key has to be the same as the `hydra.key` from the schema. If you need a compound primary key, you can concat them, e.g., `"primaryKeys": {"exp.plans.Team": "planId,teamId"}`


Check the streaming UI for your test job. If you click on it, it should say that it's running. You can view its logs at `/jobs/{jobId}/logs`.

After you're done testing, stop the job by clicking the `x` next to the job on the Hydra Streams Jobs dashboard and send a DELETE to `https://{{host}}:{{port}}/jobs/{jobId}`.

Create a DSL

When you are certain that everything is working well, you can create a new job by ading an entry for the new topic into `dsl/replications.js`.

{
	"topic": "exp.assessment.v2.snapshot.SkillAssessmentCompleted",
	"startingOffset": "earliest",
	"primaryKeys": ["runUuid"]
},

To verify the streaming job is successful

  • visit the streaming UI, your job should be in the `running` state
  • visit the `status` endpoint for your job

To reset a streaming job

  • visit the streaming UI get the jobId
  • stop the streaming job
  • use Postman or another utility to POST to `/jobs/<job-id>/reset`
  • start the streaming job from within the streaming UI

Troubleshooting

If your build fails, check the build log and see if it says "Request failed with status code 5xx". This means there might a problem with Hydra. Give it a few and if it doesn't resolve, contact the Hydra team.

If there is 4xx error, that means there is likely a syntax error in your dsl-payload json file.

If your build passes, go to your target database and make sure the data is available!

If the streaming job has stopped replicating/working.