Skip to main content

Core Concepts

AvailableCloud AvailableSelf-Managed Community (OSS)AvailableSelf-Managed Enterprise

Airbyte enables you to build data pipelines and replicate data from a source to a destination. You can configure how frequently the data is synced, what data is replicated, and how the data is written to in the destination.

This page describes the concepts you need to know to use Airbyte.

Source

A source is an API, file, database, or data warehouse that you want to ingest data from.

Destination

A destination is a data warehouse, data lake, database, or an analytics tool where you want to load your ingested data.

Connector

An Airbyte component which pulls data from a source or pushes data to a destination.

Connection

A connection is an automated data pipeline that replicates data from a source to a destination. Setting up a connection enables configuration of the following parameters:

ConceptDescription
Catalog SelectionWhat data should be replicated from the source to the destination?
Sync ModeHow should the streams be replicated (read and written)?
Sync ScheduleWhen should a data sync be triggered?
Destination Namespace and Stream PrefixWhere should the replicated data be written?
Schema PropagationHow should Airbyte handle schema drift in sources?

Stream

A stream is a group of related records.

Examples of streams:

  • A table in a relational database
  • A resource or API endpoint for a REST API
  • The records from a directory containing many files in a filesystem

Field

A field is an attribute of a record in a stream.

Examples of fields:

  • A column in the table in a relational database
  • A field in an API response

Sync Schedule

There are three options for scheduling a sync to run:

  • Scheduled (ie. every 24 hours, every 2 hours)
  • CRON schedule
  • Manual (i.e: clicking the "Sync Now" button in the UI or through the API)

For more details, see our Sync Schedules documentation.

Destination Namespace

A namespace defines where the data will be written to your destination. You can use the namespace to group streams in a source or destination. In a relational database system, this is typically known as a schema.

For more details, see our Namespace documentation.

Sync Mode

A sync mode governs how Airbyte reads from a source and writes to a destination. Airbyte provides different sync modes depending on what you want to accomplish.

Read more about each sync mode and how they differ.

Typing and Deduping

Typing and deduping ensures the data emitted from sources is written into the correct type-cast relational columns and only contains unique records. Typing and deduping is only relevant for the following relational database & warehouse destinations:

  • Snowflake
  • BigQuery
info

Typing and Deduping is the default method of transforming datasets within data warehouse and database destinations after they've been replicated. We are retaining documentation about normalization to support legacy destinations.

For more details, see our Typing & Deduping documentation.

Basic Normalization

Basic Normalization transforms data after a sync to denest columns into their own tables. Note that normalization is only available for the following relational database & warehouse destinations:

  • Redshift
  • Postgres
  • Oracle
  • MySQL
  • MSSQL

For more details, see our Basic Normalization documentation.

Custom Transformations

Airbyte integrates natively with dbt to allow you to use dbt for post-sync transformations. This is useful if you would like to trigger dbt models after a sync successfully completes.

For more details, see our dbt integration documentation.

Workspace

A workspace is a grouping of sources, destinations, connections, and other configurations. It lets you collaborate with team members and share resources across your team under a shared billing account.

Organization

Organizations let you collaborate with team members and share workspaces across your team.

Glossary of Terms

You can find a extended list of Airbyte specific terms, data engineering concepts or many other data related terms.