DX
Integrations

Snowflake Connector Guide for DataXPipe

Configure Snowflake connections in the DataXPipe Catalog, register credentials securely, and run transform checks against Snowflake warehouses.

DataXPipe Team
  • snowflake
  • connectors
  • integrations

Snowflake is a common source and target in DataXPipe pipelines. The Catalog’s Snowflake connector executes check SQL and supports transform metadata registration against Snowflake accounts. This guide covers connection setup, authentication patterns, and production tuning.

Prerequisites

Install the optional Snowflake client in your API environment:

pip install snowflake-connector-python

The connector raises a helpful error if the library is missing at runtime. Generated check scripts inherit the same dependency when executed on Airflow workers.

Registering a Snowflake connection

POST connection metadata to the Catalog:

curl -X POST https://api.dataxpipe.com/api/v1/connections/ `
  -H "X-API-KEY: dxp_your_key" `
  -H "Content-Type: application/json" `
  -d '{
    "id": "sf-prod",
    "type": "snowflake",
    "config": {
      "account": "xyz123.us-east-1",
      "user": "DATAXPIPE_SVC",
      "password_secret": "vault/snowflake/dataxpipe-password",
      "warehouse": "COMPUTE_WH",
      "database": "ANALYTICS",
      "schema": "PUBLIC",
      "role": "DATAXPIPE_ROLE"
    }
  }'

Use _secret suffix keys to reference Vault or env-backed secrets instead of plaintext passwords.

Authentication options

MethodConfigNotes
Passworduser, password or password_secretSimple; rotate regularly
Key pairprivate_key_secretPreferred for service accounts
OAuthExternal token providerEnterprise SSO integrations

For service accounts, create a dedicated Snowflake role with minimal grants:

CREATE ROLE DATAXPIPE_ROLE;
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE DATAXPIPE_ROLE;
GRANT USAGE ON DATABASE ANALYSIS TO ROLE DATAXPIPE_ROLE;
GRANT SELECT ON ALL TABLES IN SCHEMA ANALYSIS.PUBLIC TO ROLE DATAXPIPE_ROLE;

Grant only what checks and transforms require. Avoid ACCOUNTADMIN for pipeline service accounts.

Referencing Snowflake in pipeline specs

Sources and targets use connection_ref to point at registered connections:

sources:
  - id: raw_orders
    type: snowflake
    connection_ref: sf-prod
    object: RAW.ECOMMERCE.ORDERS
    load_mode: incremental
    watermark_column: UPDATED_AT

targets:
  - id: clean_orders
    type: snowflake
    connection_ref: sf-prod
    object: ANALYTICS.ORDERS.CLEAN_ORDERS
    write_mode: merge
    primary_key: [ORDER_ID]

The object field uses Snowflake’s three-part naming: DATABASE.SCHEMA.TABLE.

Executing checks against Snowflake

Use the execute endpoint to run check SQL:

curl -X POST https://api.dataxpipe.com/api/v1/checks/execute `
  -H "X-API-KEY: dxp_your_key" `
  -H "Content-Type: application/json" `
  -d '{
    "connection_id": "sf-prod",
    "sql": "SELECT COUNT(*) AS violations FROM ANALYTICS.ORDERS.CLEAN_ORDERS WHERE ORDER_ID IS NULL",
    "check_id": "chk_not_null_order_id",
    "run_id": "run-abc123",
    "failure_sample_sql": "SELECT * FROM ANALYTICS.ORDERS.CLEAN_ORDERS WHERE ORDER_ID IS NULL LIMIT 5"
  }'

The failure_sample_sql parameter captures example violating rows for debugging—critical for Snowflake tables too large to scan manually.

Performance tuning

Snowflake bills by compute time. Keep checks efficient:

  • Use appropriate warehouse size — X-Small for metadata checks; scale up only for heavy aggregations
  • Filter to recent partitionsWHERE order_date >= CURRENT_DATE - 7 on large fact tables
  • Set statement timeoutDATAXPIPE_TIMEOUT_SECONDS=60 prevents runaway queries
  • Auto-suspend warehouse — Configure Snowflake warehouse auto-suspend to 60 seconds

Configure retry behavior for transient errors via DATAXPIPE_RETRY_ATTEMPTS and DATAXPIPE_BACKOFF_BASE.

Common errors

ErrorCauseResolution
250001 auth failureWrong account identifierUse full account locator with region
002043 object not foundSchema/database mismatchVerify object three-part name
Warehouse suspendedCold start delayPre-warm or increase auto-suspend threshold
TimeoutFull table scan in checkAdd partition filter

See Airflow integration guide for deploying Snowflake transform DAGs.