Metadata Management Overview
To get started with the Hydra platform, the first step is to create a data topic with topic metadata. Metadata includes detail specific to the entire topic or stream e.g. topic name, id, owner, etc. Producers register their metadata via an endpoint, and the platform stores the metadata, and then uses it to create the Kafka topic, and register a corresponding Avro schema for the topic.
To update metadata fields or the schema, a new payload must be submitted that includes all required fields.
Topic Metadata Configuration
Topics are created once and only once, the first time metadata is registered for a given topic. The platform assumes sensible defaults for the number of partitions and replication factor (3 each), and so if this does not meet your use case please reach out to the data platform team for further instructions.
Field | Purpose | Defined By | Required? |
---|---|---|---|
id | Stream GUID | System Generated | N/A |
createdDate | Date/time topic was created | System Generated | N/A |
subject | Stream name | Data Producer | Mandatory |
streamType | Notification, CurrentState, or History | Data Producer | Mandatory |
derived | false means this topic is from the source of truth | Data Producer | Mandatory |
dataClassification | Public, InternalUseOnly, ConfidentialPII or ConfidentialFinancial | Data Producer | Mandatory |
contact | Preferred method for contacting the data owner e.g. Slack, email, etc. | Data Producer | Mandatory |
schema | Entity's payload schema definition | Data Producer | Mandatory |
additionalDocumentation | Location to additional data documentation | Data Producer | Optional |
notes | Additional notes | Data Producer | Optional |
Topic Schema
The schema is included as a normal JSON object, and is subject to Avro's evolution scheme. Any invalid evolutions will return a Bad Request
to the client.
Some basic ground rules for schema evolutions are as follows:
- Field types cannot change after creation. This is a breaking change that will require a new version of the schema.
- To add a new field, it must be made optional using Avro's
union
type with a default value. - To remove a field, it must be made optional and nullable using the
union
type.
Note on Required Fields:
There are some fields that Hydra requires within a schema to ensure better traceability of messages. Examples of these fields include:
eventName - a name for the event that occurred
eventTime - a timestamp for when the event occurred
Please refer to our documentation on avro schemas for more information.
Endpoint
Metadata payloads should be submitted to:
http://{{hydra-host-url}}
:{{port}}/topics, and include a Content-Type
header of application/json.
Example CURL request:
curl -X POST http://{{hydra-host-url}}:{{port}}/topics -H "Content-Type: application/json" -d '{...}'
Payload Example
Here is an example of a metadata payload:
{ "subject": "exp.dataplatform.TestSubject", "streamType": "Notification", "derived": false, "dataClassification": "Public", "contact": "bob@myemail.com", "additionalDocumentation": "This is a test stream of data", "notes": "additional notes", "schema": { "namespace": "exp.dataplatform", "name": "TestSubject", "type": "record", "version": 1, /* hydra key can be a single field or a comma-separated list of fields */ "hydra.key": "testField", "fields": [ { "name": "testField", "type": "string" }, { "name": "testField2", "type": ["null", "int"], "default": null }, /* example of some fields required for ENTITY data streams. Strongly recommended to include for ALL streams. */ { "name": "eventName", "type": "string" }, { "name": "eventType", "type": "string" } /* end examples */ ] } }