Skip to main content

Kinesis [ARCHIVED]

Prerequisites

  • For Airbyte Open Source users using the Postgres source connector, upgrade your Airbyte platform to version v0.40.0-alpha or newer and upgrade your Kinesis connector to version 0.1.4 or newer

Sync overview

Output schema

The incoming Airbyte data is structured in a Json format and is sent across diferent stream shards determined by the partition key. This connector maps an incoming data from a namespace and stream to a unique Kinesis stream. The Kinesis record which is sent to the stream is consisted of the following Json fields

  • _airbyte_ab_id: Random UUID generated to be used as a partition key for sending data to different shards.
  • _airbyte_emitted_at: a timestamp representing when the event was received from the data source.
  • _airbyte_data: a json text/object representing the data that was received from the data source.

Features

FeatureSupportNotes
Full Refresh Sync
Incremental - Append SyncIncoming messages are streamed/appended to a Kinesis stream as they are received.
Incremental - Append + Deduped
NamespacesNamespaces will be used to determine the Kinesis stream name.

Performance considerations

Although Kinesis is designed to handle large amounts of real-time data by scaling streams with shards, you should be aware of the following Kinesis Quotas and Limits. The connector buffer size should also be tweaked according to your data size and freguency

Getting started

Requirements

  • The connector is compatible with the latest Kinesis service version at the time of this writing.
  • Configuration
    • Endpoint: Aws Kinesis endpoint to connect to. Default endpoint if not provided
    • Region: Aws Kinesis region to connect to. Default region if not provided.
    • shardCount: The number of shards with which the stream should be created. The amount of shards affects the throughput of your stream.
    • accessKey: Access key credential for authenticating with the service.
    • privateKey: Private key credential for authenticating with the service.
    • bufferSize: Buffer size used to increase throughput by sending data in a single request.

Setup guide

CHANGELOG

VersionDatePull RequestSubject
0.1.52022-09-2216952Add required config fields