The Airbyte Snowflake destination allows you to sync data to Snowflake.
Each stream will be output into its own table in Snowflake. Each table will contain 3 columns:
_airbyte_ab_id: a uuid assigned by Airbyte to each event that is processed. The column type in Snowflake is
_airbyte_emitted_at: a timestamp representing when the event was pulled from the data source. The column type in Snowflake is
TIMESTAMP WITH TIME ZONE.
_airbyte_data: a json blob representing with the event data. The column type in Snowflake is
Full Refresh Sync
Incremental - Append Sync
Incremental - Deduped History
We recommend creating an Airbyte-specific warehouse, database, schema, user, and role for writing data into Snowflake so it is possible to track costs specifically related to Airbyte (including the cost of running this warehouse) and control permissions at a granular level. Since the Airbyte user creates, drops, and alters tables,
OWNERSHIP permissions are required in Snowflake. If you are not following the recommended script below, please limit the
OWNERSHIP permissions to only the necessary database and schema for the Airbyte user.
We provide the following script to create these resources. Before running, you must change the password to something secure. You may change the names of the other resources if you desire.
-- set variables (these need to be uppercase)set airbyte_role = 'AIRBYTE_ROLE';set airbyte_username = 'AIRBYTE_USER';set airbyte_warehouse = 'AIRBYTE_WAREHOUSE';set airbyte_database = 'AIRBYTE_DATABASE';set airbyte_schema = 'AIRBYTE_SCHEMA';-- set user passwordset airbyte_password = 'password';begin;-- create Airbyte roleuse role securityadmin;create role if not exists identifier($airbyte_role);grant role identifier($airbyte_role) to role SYSADMIN;-- create Airbyte usercreate user if not exists identifier($airbyte_username)password = $airbyte_passworddefault_role = $airbyte_roledefault_warehouse = $airbyte_warehouse;grant role identifier($airbyte_role) to user identifier($airbyte_username);-- change role to sysadmin for warehouse / database stepsuse role sysadmin;-- create Airbyte warehousecreate warehouse if not exists identifier($airbyte_warehouse)warehouse_size = xsmallwarehouse_type = standardauto_suspend = 60auto_resume = trueinitially_suspended = true;-- create Airbyte databasecreate database if not exists identifier($airbyte_database);-- grant Airbyte warehouse accessgrant USAGEon warehouse identifier($airbyte_warehouse)to role identifier($airbyte_role);-- grant Airbyte database accessgrant OWNERSHIPon database identifier($airbyte_database)to role identifier($airbyte_role);commit;begin;USE DATABASE identifier($airbyte_database);-- create schema for Airbyte dataCREATE SCHEMA IF NOT EXISTS identifier($airbyte_schema);commit;begin;-- grant Airbyte schema accessgrant OWNERSHIPon schema identifier($airbyte_schema)to role identifier($airbyte_role);commit;
You should now have all the requirements needed to configure Snowflake as a destination in the UI. You'll need the following information to configure the Snowflake destination:
Start with a letter (A-Z, a-z) or an underscore (“_”).
Contain only letters, underscores, decimal digits (0-9), and dollar signs (“$”).
When an identifier is unquoted, it is stored and resolved in uppercase.
The identifier is case-sensitive.
Delimited identifiers (i.e. identifiers enclosed in double quotes) can start with and contain any valid characters, including:
Special characters (., ', !, @, #, $, %, ^, &, *, etc.)
Extended ASCII and non-ASCII characters
When an identifier is double-quoted, it is stored and resolved exactly as entered, including case.
Regardless of whether an identifier is unquoted or double-quoted, the maximum number of characters allowed is 255 (including blank spaces).
Identifiers can also be specified using string literals, session variables or bind variables. For details, see SQL Variables.
If an object is created using a double-quoted identifier, when referenced in a query or any other SQL statement, the identifier must be specified exactly as created, including the double quotes. Failure to include the quotes might result in an Object does not exist error (or similar type of error).
Also, note that the entire identifier must be enclosed in quotes when referenced in a query/SQL statement. This is particularly important if periods (.) are used in identifiers because periods are also used in fully-qualified object names to separate each object.
Therefore, Airbyte Snowflake destination will create tables and schemas using the Unquoted identifiers when possible or fallback to Quoted Identifiers if the names are containing special characters.
By default, Airbyte uses batches of
INSERT commands to add data to a temporary table before copying it over to the final table in Snowflake. This is too slow for larger/multi-GB replications. For those larger replications we recommend configuring using cloud storage to allow batch writes and loading.
For AWS S3, you will need to create a bucket and provide credentials to access the bucket. We recommend creating a bucket that is only used for Airbyte to stage data to Snowflake. Airbyte needs read/write access to interact with this bucket.
First you will need to create a GCS bucket.
Then you will need to run the script below:
You must run the script as the account admin for Snowflake.
You should replace
AIRBYTE_ROLE with the role you used for Airbyte's Snowflake configuration.
YOURBUCKETNAME with your bucket name
The stage name can be modified to any valid name.
gcs_airbyte_integration must be used
create storage INTEGRATION gcs_airbyte_integrationTYPE = EXTERNAL_STAGESTORAGE_PROVIDER = GCSENABLED = TRUESTORAGE_ALLOWED_LOCATIONS = ('gcs://YOURBUCKETNAME');create stage gcs_airbyte_stageurl = 'gcs://io_airbyte_test_staging'storage_integration = gcs_airbyte_integration;GRANT USAGE ON integration gcs_airbyte_integration TO ROLE AIRBYTE_ROLE;GRANT USAGE ON stage gcs_airbyte_stage TO ROLE AIRBYTE_ROLE;DESC STORAGE INTEGRATION gcs_airbyte_integration;
The final query should show a
STORAGE_GCP_SERVICE_ACCOUNT property with an email as the property value.
Finally, you need to add read/write permissions to your bucket with that email.
Fixed AWS S3 Staging COPY is writing records from different table in the same raw table
Updated query timeout from 30 minutes to 3 hours
Partial Success in BufferedStreamConsumer
Tag traffic with