Running the Catalog on Postgres in Production
Migrate the DataXPipe Catalog from SQLite to managed Postgres, apply Alembic migrations, tune connection pools, and configure backups for production workloads.
- postgres
- deployment
- database
The DataXPipe scaffold defaults to SQLite for local development. Production deployments require Postgres for concurrent writes, HA failover, and SaaS multi-tenancy. This guide walks through migration, schema management, and operational practices.
Why Postgres for the Catalog
SQLite works for single-developer POCs but breaks down when:
- Multiple Airflow workers POST run events concurrently
- SaaS organizations share the same API with row-level isolation
- You need point-in-time recovery and managed backups
- Alembic migrations must run atomically during deploys
Managed Postgres (DigitalOcean, RDS, Cloud SQL) provides connection pooling, SSL, and automated backups out of the box.
Configuration
Set the database URL before starting the API:
$env:DATAXPIPE_DB = "postgresql://user:pass@host:25060/dataxpipe?sslmode=require"
$env:DATAXPIPE_USE_ALEMBIC = "true"
uvicorn app.main:app --port 8000
Enable Alembic so the app skips implicit runtime schema creation—migrations are the source of truth.
Running migrations
Apply schema changes during deployment:
python scripts/run_migrations.py
# or directly:
alembic upgrade head
Migration history lives in alembic/versions/. Key migrations include:
| Version | Contents |
|---|---|
0001_initial | Pipelines, runs, checks, connections, lineage |
0002_saas | Organizations, API keys, Stripe billing columns |
Always run migrations before routing traffic to a new app version.
Connection pool tuning
The Catalog uses a Postgres connection pool controlled by:
| Variable | Default | Recommendation |
|---|---|---|
DATAXPIPE_PG_POOL_MIN | 1 | 2–5 for steady-state |
DATAXPIPE_PG_POOL_MAX | 10 | Match managed DB connection limit ÷ replica count |
Oversized pools exhaust managed Postgres connection limits. Undersized pools cause latency spikes during concurrent check result writes.
For high-traffic SaaS, place PgBouncer between the app and managed Postgres to multiplex connections.
Local development with Docker
Test Postgres integration locally before production cutover:
docker run --name dxp-postgres `
-e POSTGRES_PASSWORD=postgres `
-e POSTGRES_USER=postgres `
-e POSTGRES_DB=dpdb `
-p 5432:5432 -d postgres:15
$env:POSTGRES_DSN = "postgresql://postgres:postgres@localhost:5432/dpdb"
$env:DATAXPIPE_DB = "postgresql://postgres:postgres@localhost:5432/dpdb"
pytest -q tests/test_postgres_integration.py
Registering Postgres warehouse connections
The Catalog executes checks against registered warehouse connections—not just the metadata database:
curl -X POST http://localhost:8000/api/v1/connections/ `
-H "X-API-KEY: platform-key" `
-H "Content-Type: application/json" `
-d '{"id":"pg-prod","type":"postgres","config":{"dsn":"postgresql://..."}}'
Use _secret suffix keys for passwords stored in Vault:
{
"id": "pg-prod",
"type": "postgres",
"config": {
"host": "db.example.com",
"port": 5432,
"user": "dbuser",
"password_secret": "vault/data/db-password",
"dbname": "warehouse"
}
}
Backups and recovery
Schedule regular backups with scripts/backup_db.py. Upload artifacts to object storage (DigitalOcean Spaces, S3):
python scripts/backup_db.py --output backups/ --upload spaces://my-bucket/dataxpipe/
Test restore quarterly. Catalog metadata loss means losing pipeline registration history, run audit trails, and lineage graphs.
Readiness checks
The /ready endpoint verifies Postgres connectivity. Configure Kubernetes or App Platform health checks against /ready, not just /health:
curl https://api.dataxpipe.com/ready
A failing readiness probe during deploy rolls back before clients hit a broken database connection.
CI integration
The repository includes .github/workflows/postgres-integration.yml which runs integration tests against a Postgres 15 service container on every push to main. Mirror this pattern in your fork to catch migration regressions early.
Migration from SQLite
To migrate existing local data:
- Export pipeline metadata from SQLite via API (
GET /pipelines/) - Stand up Postgres and run
alembic upgrade head - Re-register pipelines via
POST /pipelines/with exported JSON - Update
DATAXPIPE_DBand redeploy
Run history from the SQLite era can be archived separately; most teams treat the Postgres cutover as a clean slate.