Glossary of Terms

Airbyte CDK

The Airbyte CDK (Connector Development Kit) allows you to create connectors for Sources or Destinations. If your source or destination doesn't exist, you can use the CDK to make the building process a lot easier. It generates all the tests and files you need and all you need to do is write the connector-specific code for your source or destination. We created one in Python which you can check out here and the Faros AI team created a Javascript/Typescript one that you can check out here.

DAG

DAG stands for Directed Acyclic Graph. It's a term originally coined by math graph theorists that describes a tree-like process that cannot contain loops. For example, in the following diagram, you start at A and can choose B or C, which then proceed to D and E, respectively. This kind of structure is great for representing workflows and is what tools like Airflow use to orchestrate the execution of software based on different cases or states.

ETL/ELT

Stands for Extract, Transform, and Load and Extract, Load, and Transform, respectively.
Extract: Retrieve data from a source, which can be an application, database, anything really.
Load: Move data to your destination.
Transform: Clean up the data. This is referred to as normalization in Airbyte and involves deduplication, changing data types, formats, and more.

Full Refresh Sync

A Full Refresh Sync will attempt to retrieve all data from the source every time a sync is run. Then there are two choices, Overwrite and Append. Overwrite deletes the data in the destination before running the sync and Append doesn't.

Incremental Sync

An Incremental Sync will only retrieve new data from the source when a sync occurs. The first sync will always attempt to retrieve all the data. If the destination supports it, you can have your data deduplicated. Simply put, this just means that if you sync an updated version of a record you've already synced, it will remove the old record.

Raw Tables

Airbyte spits out tables with the prefix _airbyte_raw_. This is your replicated data, but the prefix indicates that it's not normalized. If you select basic normalization, Airbyte will create renamed versions without the prefix.

Advanced Terms

AirbyteCatalog

This is only relevant for individuals who want to create a connector.
This refers to how you define the data that you can retrieve from a Source. For example, if you want to retrieve information from an API, the data that you can receive needs to be defined clearly so that Airbyte can have a clear expectation of what endpoints are supported and what the objects that the streams return look like. This is represented as a sort of schema that Airbyte can interpret. Learn more here.

Airbyte Specification

This is only relevant for individuals who want to create a connector.
This refers to the functions that a Source or Destination must implement to successfully retrieve data and load it, respectively. Implementing these functions using the Airbyte Specification makes a Source or Destination work correctly. Learn more here.

Temporal

This is only relevant for individuals who want to learn about or contribute to our underlying platform.
Temporal is a development kit that lets you create workflows, parallelize them, and handle failures/retries gracefully. We use it to reliably schedule each step of the ELT process, and a Temporal service is always deployed with each Airbyte installation.
Last modified 1mo ago