DX
Deployment

Running the Catalog on Postgres in Production

Migrate the DataXPipe Catalog from SQLite to managed Postgres, apply Alembic migrations, tune connection pools, and configure backups for production workloads.

DataXPipe Team
  • postgres
  • deployment
  • database

The DataXPipe scaffold defaults to SQLite for local development. Production deployments require Postgres for concurrent writes, HA failover, and SaaS multi-tenancy. This guide walks through migration, schema management, and operational practices.

Why Postgres for the Catalog

SQLite works for single-developer POCs but breaks down when:

  • Multiple Airflow workers POST run events concurrently
  • SaaS organizations share the same API with row-level isolation
  • You need point-in-time recovery and managed backups
  • Alembic migrations must run atomically during deploys

Managed Postgres (DigitalOcean, RDS, Cloud SQL) provides connection pooling, SSL, and automated backups out of the box.

Configuration

Set the database URL before starting the API:

$env:DATAXPIPE_DB = "postgresql://user:pass@host:25060/dataxpipe?sslmode=require"
$env:DATAXPIPE_USE_ALEMBIC = "true"
uvicorn app.main:app --port 8000

Enable Alembic so the app skips implicit runtime schema creation—migrations are the source of truth.

Running migrations

Apply schema changes during deployment:

python scripts/run_migrations.py
# or directly:
alembic upgrade head

Migration history lives in alembic/versions/. Key migrations include:

VersionContents
0001_initialPipelines, runs, checks, connections, lineage
0002_saasOrganizations, API keys, Stripe billing columns

Always run migrations before routing traffic to a new app version.

Connection pool tuning

The Catalog uses a Postgres connection pool controlled by:

VariableDefaultRecommendation
DATAXPIPE_PG_POOL_MIN12–5 for steady-state
DATAXPIPE_PG_POOL_MAX10Match managed DB connection limit ÷ replica count

Oversized pools exhaust managed Postgres connection limits. Undersized pools cause latency spikes during concurrent check result writes.

For high-traffic SaaS, place PgBouncer between the app and managed Postgres to multiplex connections.

Local development with Docker

Test Postgres integration locally before production cutover:

docker run --name dxp-postgres `
  -e POSTGRES_PASSWORD=postgres `
  -e POSTGRES_USER=postgres `
  -e POSTGRES_DB=dpdb `
  -p 5432:5432 -d postgres:15

$env:POSTGRES_DSN = "postgresql://postgres:postgres@localhost:5432/dpdb"
$env:DATAXPIPE_DB = "postgresql://postgres:postgres@localhost:5432/dpdb"
pytest -q tests/test_postgres_integration.py

Registering Postgres warehouse connections

The Catalog executes checks against registered warehouse connections—not just the metadata database:

curl -X POST http://localhost:8000/api/v1/connections/ `
  -H "X-API-KEY: platform-key" `
  -H "Content-Type: application/json" `
  -d '{"id":"pg-prod","type":"postgres","config":{"dsn":"postgresql://..."}}'

Use _secret suffix keys for passwords stored in Vault:

{
  "id": "pg-prod",
  "type": "postgres",
  "config": {
    "host": "db.example.com",
    "port": 5432,
    "user": "dbuser",
    "password_secret": "vault/data/db-password",
    "dbname": "warehouse"
  }
}

Backups and recovery

Schedule regular backups with scripts/backup_db.py. Upload artifacts to object storage (DigitalOcean Spaces, S3):

python scripts/backup_db.py --output backups/ --upload spaces://my-bucket/dataxpipe/

Test restore quarterly. Catalog metadata loss means losing pipeline registration history, run audit trails, and lineage graphs.

Readiness checks

The /ready endpoint verifies Postgres connectivity. Configure Kubernetes or App Platform health checks against /ready, not just /health:

curl https://api.dataxpipe.com/ready

A failing readiness probe during deploy rolls back before clients hit a broken database connection.

CI integration

The repository includes .github/workflows/postgres-integration.yml which runs integration tests against a Postgres 15 service container on every push to main. Mirror this pattern in your fork to catch migration regressions early.

Migration from SQLite

To migrate existing local data:

  1. Export pipeline metadata from SQLite via API (GET /pipelines/)
  2. Stand up Postgres and run alembic upgrade head
  3. Re-register pipelines via POST /pipelines/ with exported JSON
  4. Update DATAXPIPE_DB and redeploy

Run history from the SQLite era can be archived separately; most teams treat the Postgres cutover as a clean slate.