Data Ingestion Overview

This page describes the use of ingestors in Hydra.

What's a Hydra Ingestor?

An ingestor is a component responsible for mediating the process of translating a request into a record that can be published to a data store.

Ingestors are implemented as Akka actors and are managed by the Hydra platform.

The Hydra Request

A request is composed of two parts:

  1. The payload (JSON)
  2. Metadata that describes how the platform should handle the request.  


The Ingestion Protocol

The Hydra ingestion protocol is made of three distinct sequentially asynchronous phases, as follows:

Publish

During this phase, metadata about the request is sent to all registered ingestors. Upon inspection, ingestors must reply with one of two messages:

Join

Indicates that this ingestor is able to process the incoming request.

Ignore

Indicates that this ingestor is not able to process the incoming request.

Validate

The second step in the ingestion protocol involves validation of the message payload. It is only sent to the ingestors that have replied with Join to the Publish request.

The semantics of message validation is up to ingestor. In the case of Kafka, for instance, we validate the incoming JSON payload against its Avro schema.


The expected reply for Validate can be one of two messages:

ValidRequest

Request is valid and well-formed; ingestion can proceed.

InvalidRequest

Request is not valid and cannot proceed.

Ingest

The last step of the protocol involves sending the record to the transport layer so that it can be published.


The expected reply for Ingest can be:

RecordAccepted

If the request was accepted by the transport.  Used exclusively for NoAck ack strategies.

RecordProduced

Record was successfully produced.  This can happen when:

  1. The record was saved to the Akka journal, if using the Persisted ack strategy. 
  2. The record was saved to the underlying data store, if using the Replicated ack strategy.
RecordNotProduced

An error prevented the error from being produced to the underlying data store.