Metadata Management Overview

To get started with the Hydra platform, the first step is to create a data topic with topic metadata. Metadata includes detail specific to the entire topic or stream e.g. topic name, id, owner, etc. Producers register their metadata via an endpoint, and the platform stores the metadata, and then uses it to create the Kafka topic, and register a corresponding Avro schema for the topic. 

To update metadata fields or the schema, a new payload must be submitted that includes all required fields.  

Topic Metadata Configuration

Topics are created once and only once, the first time metadata is registered for a given topic.  The platform assumes sensible defaults for the number of partitions and replication factor (3 each), and so if this does not meet your use case please reach out to the data platform team for further instructions.

FieldPurposeDefined ByRequired?
idStream GUIDSystem GeneratedN/A
createdDateDate/time topic was createdSystem GeneratedN/A
subjectStream name Data ProducerMandatory
streamTypeNotification, CurrentState, or HistoryData ProducerMandatory
derivedfalse means this topic is from the source of truthData ProducerMandatory
dataClassificationPublic, InternalUseOnly, ConfidentialPII or ConfidentialFinancialData ProducerMandatory
contactPreferred method for contacting the data owner e.g. Slack, email, etc.Data ProducerMandatory
schemaEntity's payload schema definitionData ProducerMandatory
additionalDocumentationLocation to additional data documentationData ProducerOptional
notesAdditional notesData ProducerOptional

Topic Schema

The schema is included as a normal JSON object, and is subject to Avro's evolution scheme.  Any invalid evolutions will return a Bad Request to the client.

Some basic ground rules for schema evolutions are as follows:

  • Field types cannot change after creation.  This is a breaking change that will require a new version of the schema.
  • To add a new field, it must be made optional using Avro's union type with a default value.
  • To remove a field, it must be made optional and nullable using the union type.

Note on Required Fields:

There are some fields that Hydra requires within a schema to ensure better traceability of messages. Examples of these fields include:

eventName - a name for the event that occurred

eventTime - a timestamp for when the event occurred


Please refer to our documentation on avro schemas for more information.


Endpoint

Metadata payloads should be submitted to: http://{{hydra-host-url}}:{{port}}/topics, and include a Content-Type header of application/json.

Example CURL request: 


curl -X POST http://{{hydra-host-url}}:{{port}}/topics -H "Content-Type: application/json" -d '{...}'


Payload Example

Here is an example of a metadata payload:

{
	"subject": "exp.dataplatform.TestSubject",
	"streamType": "Notification",
    "derived": false,
	"dataClassification": "Public",
	"contact": "bob@myemail.com",
	"additionalDocumentation": "This is a test stream of data",
	"notes": "additional notes",
	"schema": {
	  "namespace": "exp.dataplatform",
	  "name": "TestSubject",
	  "type": "record",
	  "version": 1,
	  /* hydra key can be a single field or a comma-separated list of fields */
      "hydra.key": "testField",
	  "fields": [
	    {
	      "name": "testField",
	      "type": "string"
	    },
	    {
	      "name": "testField2",
	      "type": ["null", "int"],
	      "default": null
	    },
		/* example of some fields required for ENTITY data streams. Strongly recommended to include for ALL streams. */
		{
		  "name": "eventName",
          "type": "string"
		},
		{
		  "name": "eventType",
          "type": "string"
		}
		/* end examples */
	  ]
	}
}