Running the Catalog on Postgres in Production

The DataXPipe scaffold defaults to SQLite for local development. Production deployments require Postgres for concurrent writes, HA failover, and SaaS multi-tenancy. This guide walks through migration, schema management, and operational practices.

Why Postgres for the Catalog

SQLite works for single-developer POCs but breaks down when:

Multiple Airflow workers POST run events concurrently
SaaS organizations share the same API with row-level isolation
You need point-in-time recovery and managed backups
Alembic migrations must run atomically during deploys

Managed Postgres (DigitalOcean, RDS, Cloud SQL) provides connection pooling, SSL, and automated backups out of the box.

Configuration

Set the database URL before starting the API:

$env:DATAXPIPE_DB = "postgresql://user:pass@host:25060/dataxpipe?sslmode=require"
$env:DATAXPIPE_USE_ALEMBIC = "true"
uvicorn app.main:app --port 8000

Enable Alembic so the app skips implicit runtime schema creation—migrations are the source of truth.

Running migrations

Apply schema changes during deployment:

python scripts/run_migrations.py
# or directly:
alembic upgrade head

Migration history lives in alembic/versions/. Key migrations include:

Version	Contents
`0001_initial`	Pipelines, runs, checks, connections, lineage
`0002_saas`	Organizations, API keys, Stripe billing columns

Always run migrations before routing traffic to a new app version.

Connection pool tuning

The Catalog uses a Postgres connection pool controlled by:

Variable	Default	Recommendation
`DATAXPIPE_PG_POOL_MIN`	1	2–5 for steady-state
`DATAXPIPE_PG_POOL_MAX`	10	Match managed DB connection limit ÷ replica count

Oversized pools exhaust managed Postgres connection limits. Undersized pools cause latency spikes during concurrent check result writes.

For high-traffic SaaS, place PgBouncer between the app and managed Postgres to multiplex connections.

Local development with Docker

Test Postgres integration locally before production cutover:

docker run --name dxp-postgres `
  -e POSTGRES_PASSWORD=postgres `
  -e POSTGRES_USER=postgres `
  -e POSTGRES_DB=dpdb `
  -p 5432:5432 -d postgres:15

$env:POSTGRES_DSN = "postgresql://postgres:postgres@localhost:5432/dpdb"
$env:DATAXPIPE_DB = "postgresql://postgres:postgres@localhost:5432/dpdb"
pytest -q tests/test_postgres_integration.py

Registering Postgres warehouse connections

The Catalog executes checks against registered warehouse connections—not just the metadata database:

curl -X POST http://localhost:8000/api/v1/connections/ `
  -H "X-API-KEY: platform-key" `
  -H "Content-Type: application/json" `
  -d '{"id":"pg-prod","type":"postgres","config":{"dsn":"postgresql://..."}}'

Use _secret suffix keys for passwords stored in Vault:

{
  "id": "pg-prod",
  "type": "postgres",
  "config": {
    "host": "db.example.com",
    "port": 5432,
    "user": "dbuser",
    "password_secret": "vault/data/db-password",
    "dbname": "warehouse"
  }
}

Backups and recovery

Schedule regular backups with scripts/backup_db.py. Upload artifacts to object storage (DigitalOcean Spaces, S3):

python scripts/backup_db.py --output backups/ --upload spaces://my-bucket/dataxpipe/

Test restore quarterly. Catalog metadata loss means losing pipeline registration history, run audit trails, and lineage graphs.

Readiness checks

The /ready endpoint verifies Postgres connectivity. Configure Kubernetes or App Platform health checks against /ready, not just /health:

curl https://api.dataxpipe.com/ready

A failing readiness probe during deploy rolls back before clients hit a broken database connection.

CI integration

The repository includes .github/workflows/postgres-integration.yml which runs integration tests against a Postgres 15 service container on every push to main. Mirror this pattern in your fork to catch migration regressions early.

Migration from SQLite

To migrate existing local data:

Export pipeline metadata from SQLite via API (GET /pipelines/)
Stand up Postgres and run alembic upgrade head
Re-register pipelines via POST /pipelines/ with exported JSON
Update DATAXPIPE_DB and redeploy

Run history from the SQLite era can be archived separately; most teams treat the Postgres cutover as a clean slate.