Why multi-tenant?
The decision between single-tenant and multi-tenant architecture is one of the most consequential choices you'll make early in a SaaS product's life. Single-tenant is simpler to reason about: every customer gets their own database, their own deployment, their own everything. But it doesn't scale operationally. When you have 200 customers, you have 200 databases to migrate, 200 instances to monitor, and 200 things that can break independently at 3am.
Multi-tenancy consolidates that operational complexity. One database, one deployment, one migration path. The tradeoff is that your application code needs to be tenant-aware from the very first line. There's no retrofitting this -- I've tried, and the scars are still visible in a codebase I maintain to this day.
For this walkthrough, I'm going to focus on what I call the "shared schema, isolated rows" approach. It's the sweet spot for most SaaS products that aren't in heavily regulated industries. If you're building for healthcare or finance, you might need schema-per-tenant or database-per-tenant, but that's a different article.
Isolation strategies
There are three primary approaches to tenant isolation in Postgres, each with distinct tradeoffs:
- Database-per-tenant: Maximum isolation, maximum operational burden. Each tenant gets a separate Postgres database. Works well for enterprise tiers where customers demand it.
- Schema-per-tenant: Good isolation with moderate overhead. Each tenant gets their own schema within a shared database. Migrations become per-schema operations.
- Shared schema with row-level security: Minimum overhead, relies on Postgres RLS to enforce isolation at the database level. This is what we'll build.
These approaches aren't mutually exclusive. Many production systems use a hybrid: shared schema for most tenants, with database-per-tenant available as a premium enterprise feature. Start shared, promote later.
Schema design
The foundation of any multi-tenant system is a tenant_id column on every table that holds tenant-specific data. This sounds obvious, but the implications are far-reaching. Every index needs to be tenant-aware. Every foreign key constraint operates within the context of a tenant. Every query needs to filter by tenant, whether explicitly or through RLS.
Here's the core schema I start with:
CREATE TABLE tenants ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT NOT NULL, slug TEXT NOT NULL UNIQUE, plan TEXT NOT NULL DEFAULT 'free', created_at TIMESTAMPTZ NOT NULL DEFAULT now(), settings JSONB NOT NULL DEFAULT '{}'::jsonb ); CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL REFERENCES tenants(id), email TEXT NOT NULL, role TEXT NOT NULL DEFAULT 'member', created_at TIMESTAMPTZ NOT NULL DEFAULT now(), UNIQUE(tenant_id, email) ); -- Every tenant-scoped table follows this pattern CREATE TABLE projects ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL REFERENCES tenants(id), name TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'active', created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); CREATE INDEX idx_users_tenant ON users(tenant_id); CREATE INDEX idx_projects_tenant ON projects(tenant_id);
The UNIQUE(tenant_id, email) constraint is critical -- it means two different tenants can each have a user with the same email address (common in B2B SaaS where one person belongs to multiple organizations), but within a single tenant, email addresses remain unique.
Row-level security
Postgres RLS is the real hero here. Instead of relying on your application code to always include WHERE tenant_id = ... in every query -- a pattern that's one missed filter away from a data breach -- you push that enforcement down to the database itself.
-- Enable RLS on tenant-scoped tables ALTER TABLE users ENABLE ROW LEVEL SECURITY; ALTER TABLE projects ENABLE ROW LEVEL SECURITY; -- Policy: rows visible only to matching tenant CREATE POLICY tenant_isolation ON users USING (tenant_id = current_setting('app.current_tenant')::uuid); CREATE POLICY tenant_isolation ON projects USING (tenant_id = current_setting('app.current_tenant')::uuid); -- The app sets this at the start of each request: -- SET LOCAL app.current_tenant = 'tenant-uuid-here';
The SET LOCAL is important -- it scopes the setting to the current transaction, so it's automatically cleaned up. No risk of tenant context leaking between requests even if you're using a connection pool.
RLS policies are bypassed by superusers and table owners by default. Your application should connect as a non-superuser role. Create a dedicated app_user role with only the permissions it needs, and ensure RLS is enforced for that role.
Connection pooling
Connection pooling in a multi-tenant system needs extra thought. You're setting session-level variables (app.current_tenant) on each connection, which means you need to ensure those variables are reset when a connection returns to the pool.
With PgBouncer in transaction mode, session variables don't persist between transactions, which is exactly what we want. With node-postgres directly, you need a pool wrapper:
import { Pool, PoolClient } from 'pg'; const pool = new Pool({ connectionString: process.env.DATABASE_URL, max: 20, idleTimeoutMillis: 30000, }); export async function withTenant<T>( tenantId: string, fn: (client: PoolClient) => Promise<T> ): Promise<T> { const client = await pool.connect(); try { await client.query('BEGIN'); await client.query( `SET LOCAL app.current_tenant = $1`, [tenantId] ); const result = await fn(client); await client.query('COMMIT'); return result; } catch (err) { await client.query('ROLLBACK'); throw err; } finally { client.release(); } }
Every database operation goes through withTenant. There's no way to accidentally query without a tenant context, because the function signature demands it. This is the kind of API design that prevents bugs through structure rather than discipline.
Tenant resolution
Before you can set app.current_tenant, you need to figure out which tenant the request belongs to. There are several strategies, and you might use more than one:
- Subdomain:
acme.yourapp.com-- clean, professional, but requires wildcard DNS and TLS. - Path prefix:
yourapp.com/acme/...-- simpler infrastructure, but muddies your URL space. - JWT claim: The tenant ID is embedded in the authentication token. Most reliable for API-first products.
- Header:
X-Tenant-ID-- useful for internal services and service-to-service communication.
Here's a Fastify middleware that resolves tenant from either a subdomain or a JWT:
import { FastifyRequest } from 'fastify'; export async function resolveTenant( request: FastifyRequest ) { // Strategy 1: subdomain const host = request.headers.host ?? ''; const subdomain = host.split('.')[0]; if (subdomain && subdomain !== 'www' && subdomain !== 'api') { const tenant = await findTenantBySlug(subdomain); if (tenant) { request.tenantId = tenant.id; return; } } // Strategy 2: JWT claim const token = request.user; // from auth plugin if (token?.tenantId) { request.tenantId = token.tenantId; return; } throw new Error('Tenant could not be resolved'); }
Migrations at scale
Database migrations in a multi-tenant shared-schema system are simpler than per-schema approaches because you only run each migration once. But they carry higher risk: a bad migration affects every tenant simultaneously.
My approach:
- All migrations must be backward-compatible. The old version of the application must continue to work against the new schema. This means no column renames, no NOT NULL additions without defaults, and no column drops without a deprecation period.
- Use advisory locks to prevent concurrent migrations. Two deployments running migrations simultaneously is a recipe for a corrupted schema.
- Log everything. Every migration run gets a row in a
migration_logtable with the migration name, timestamp, duration, and result.
Deployment pipeline
The deployment pipeline for a multi-tenant SaaS needs to be more cautious than a typical application. A bug in deployment doesn't just affect one customer -- it affects all of them.
Here's the pipeline we settled on after several painful iterations:
steps: - name: Run migrations (with advisory lock) run: | node scripts/migrate.js \ --lock-id 12345 \ --timeout 300 \ --dry-run-first - name: Deploy canary (5% traffic) run: | kubectl set image deployment/api \ api=registry.io/api:${{ github.sha }} kubectl rollout status deployment/api-canary - name: Validate canary health run: | # Check error rate, latency p99, tenant isolation node scripts/canary-check.js \ --duration 300 \ --error-threshold 0.01 \ --p99-threshold 500 - name: Promote to full rollout run: | kubectl rollout restart deployment/api kubectl rollout status deployment/api
The canary step is non-negotiable. We route 5% of traffic to the new version for five minutes and watch the error rate, latency, and -- critically -- run automated checks that verify tenant isolation is still intact. If anything trips, the pipeline stops and alerts the on-call engineer.
Lessons learned
After running this architecture in production for two years across three different products, here's what I wish I'd known from the start:
- Test tenant isolation continuously. Not just in CI -- in production. We run a synthetic tenant that makes requests every minute and verifies it can only see its own data. It has caught two RLS misconfigurations before they became incidents.
- Plan for the "noisy neighbor" problem early. One tenant running a massive export shouldn't slow down everyone else. Rate limiting per tenant is table stakes. Connection limits per tenant via PgBouncer are more effective.
- Tenant-aware logging is not optional. Every log line, every metric, every trace must carry the tenant ID. When something breaks, "which tenant is affected?" is always the first question.
- Don't share caches naively. A Redis key of
users:listis a data leak waiting to happen. Usetenant:{id}:users:listor use separate Redis databases per tenant. - Build the admin panel on day one. You will need to impersonate tenants, inspect their data, and debug their issues. A tenant-aware admin panel saves more time than any other internal tool.
Multi-tenancy is an architectural commitment, not a feature you bolt on. The earlier you embrace it, the cleaner your system will be. Start with shared-schema and RLS, keep your migration discipline tight, and invest in observability from the beginning.
If you're building something similar and have questions about the specifics, reach out. I'm always happy to talk architecture over a virtual coffee.