Building a Multi-Tenant SaaS Platform with Node.js and Postgres

Why multi-tenant?

The decision between single-tenant and multi-tenant architecture is one of the most consequential choices you'll make early in a SaaS product's life. Single-tenant is simpler to reason about: every customer gets their own database, their own deployment, their own everything. But it doesn't scale operationally. When you have 200 customers, you have 200 databases to migrate, 200 instances to monitor, and 200 things that can break independently at 3am.

Multi-tenancy consolidates that operational complexity. One database, one deployment, one migration path. The tradeoff is that your application code needs to be tenant-aware from the very first line. There's no retrofitting this -- I've tried, and the scars are still visible in a codebase I maintain to this day.

For this walkthrough, I'm going to focus on what I call the "shared schema, isolated rows" approach. It's the sweet spot for most SaaS products that aren't in heavily regulated industries. If you're building for healthcare or finance, you might need schema-per-tenant or database-per-tenant, but that's a different article.

Isolation strategies

There are three primary approaches to tenant isolation in Postgres, each with distinct tradeoffs:

Database-per-tenant: Maximum isolation, maximum operational burden. Each tenant gets a separate Postgres database. Works well for enterprise tiers where customers demand it.
Schema-per-tenant: Good isolation with moderate overhead. Each tenant gets their own schema within a shared database. Migrations become per-schema operations.
Shared schema with row-level security: Minimum overhead, relies on Postgres RLS to enforce isolation at the database level. This is what we'll build.

Note

These approaches aren't mutually exclusive. Many production systems use a hybrid: shared schema for most tenants, with database-per-tenant available as a premium enterprise feature. Start shared, promote later.

Isolation strategy comparison diagram — Fig. 1 -- Comparing isolation strategies by operational complexity and security boundary strength.

Schema design

The foundation of any multi-tenant system is a tenant_id column on every table that holds tenant-specific data. This sounds obvious, but the implications are far-reaching. Every index needs to be tenant-aware. Every foreign key constraint operates within the context of a tenant. Every query needs to filter by tenant, whether explicitly or through RLS.

Here's the core schema I start with:

          SQL
          migrations/001_tenants.sql
        

CREATE TABLE tenants (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name        TEXT NOT NULL,
  slug        TEXT NOT NULL UNIQUE,
  plan        TEXT NOT NULL DEFAULT 'free',
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  settings    JSONB NOT NULL DEFAULT '{}'::jsonb
);

CREATE TABLE users (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  tenant_id   UUID NOT NULL REFERENCES tenants(id),
  email       TEXT NOT NULL,
  role        TEXT NOT NULL DEFAULT 'member',
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE(tenant_id, email)
);

-- Every tenant-scoped table follows this pattern
CREATE TABLE projects (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  tenant_id   UUID NOT NULL REFERENCES tenants(id),
  name        TEXT NOT NULL,
  status      TEXT NOT NULL DEFAULT 'active',
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_users_tenant ON users(tenant_id);
CREATE INDEX idx_projects_tenant ON projects(tenant_id);

The UNIQUE(tenant_id, email) constraint is critical -- it means two different tenants can each have a user with the same email address (common in B2B SaaS where one person belongs to multiple organizations), but within a single tenant, email addresses remain unique.

Row-level security

Postgres RLS is the real hero here. Instead of relying on your application code to always include WHERE tenant_id = ... in every query -- a pattern that's one missed filter away from a data breach -- you push that enforcement down to the database itself.

          SQL
          migrations/002_rls.sql
        

-- Enable RLS on tenant-scoped tables
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;

-- Policy: rows visible only to matching tenant
CREATE POLICY tenant_isolation ON users
  USING (tenant_id = current_setting('app.current_tenant')::uuid);

CREATE POLICY tenant_isolation ON projects
  USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- The app sets this at the start of each request:
-- SET LOCAL app.current_tenant = 'tenant-uuid-here';

The SET LOCAL is important -- it scopes the setting to the current transaction, so it's automatically cleaned up. No risk of tenant context leaking between requests even if you're using a connection pool.

Warning

RLS policies are bypassed by superusers and table owners by default. Your application should connect as a non-superuser role. Create a dedicated app_user role with only the permissions it needs, and ensure RLS is enforced for that role.

Connection pooling

Connection pooling in a multi-tenant system needs extra thought. You're setting session-level variables (app.current_tenant) on each connection, which means you need to ensure those variables are reset when a connection returns to the pool.

With PgBouncer in transaction mode, session variables don't persist between transactions, which is exactly what we want. With node-postgres directly, you need a pool wrapper:

          TypeScript
          src/db/pool.ts
        

import { Pool, PoolClient } from 'pg';

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20,
  idleTimeoutMillis: 30000,
});

export async function withTenant<T>(
  tenantId: string,
  fn: (client: PoolClient) => Promise<T>
): Promise<T> {
  const client = await pool.connect();
  try {
    await client.query('BEGIN');
    await client.query(
      `SET LOCAL app.current_tenant = $1`,
      [tenantId]
    );
    const result = await fn(client);
    await client.query('COMMIT');
    return result;
  } catch (err) {
    await client.query('ROLLBACK');
    throw err;
  } finally {
    client.release();
  }
}

Every database operation goes through withTenant. There's no way to accidentally query without a tenant context, because the function signature demands it. This is the kind of API design that prevents bugs through structure rather than discipline.

Tenant resolution

Before you can set app.current_tenant, you need to figure out which tenant the request belongs to. There are several strategies, and you might use more than one:

Subdomain: acme.yourapp.com -- clean, professional, but requires wildcard DNS and TLS.
Path prefix: yourapp.com/acme/... -- simpler infrastructure, but muddies your URL space.
JWT claim: The tenant ID is embedded in the authentication token. Most reliable for API-first products.
Header: X-Tenant-ID -- useful for internal services and service-to-service communication.

Here's a Fastify middleware that resolves tenant from either a subdomain or a JWT:

          TypeScript
          src/middleware/tenant.ts
        

import { FastifyRequest } from 'fastify';

export async function resolveTenant(
  request: FastifyRequest
) {
  // Strategy 1: subdomain
  const host = request.headers.host ?? '';
  const subdomain = host.split('.')[0];

  if (subdomain && subdomain !== 'www' && subdomain !== 'api') {
    const tenant = await findTenantBySlug(subdomain);
    if (tenant) {
      request.tenantId = tenant.id;
      return;
    }
  }

  // Strategy 2: JWT claim
  const token = request.user;  // from auth plugin
  if (token?.tenantId) {
    request.tenantId = token.tenantId;
    return;
  }

  throw new Error('Tenant could not be resolved');
}

Migrations at scale

Database migrations in a multi-tenant shared-schema system are simpler than per-schema approaches because you only run each migration once. But they carry higher risk: a bad migration affects every tenant simultaneously.

My approach:

All migrations must be backward-compatible. The old version of the application must continue to work against the new schema. This means no column renames, no NOT NULL additions without defaults, and no column drops without a deprecation period.
Use advisory locks to prevent concurrent migrations. Two deployments running migrations simultaneously is a recipe for a corrupted schema.
Log everything. Every migration run gets a row in a migration_log table with the migration name, timestamp, duration, and result.

Deployment pipeline

The deployment pipeline for a multi-tenant SaaS needs to be more cautious than a typical application. A bug in deployment doesn't just affect one customer -- it affects all of them.

Here's the pipeline we settled on after several painful iterations:

          YAML
          .github/workflows/deploy.yml
        

steps:
  - name: Run migrations (with advisory lock)
    run: |
      node scripts/migrate.js \
        --lock-id 12345 \
        --timeout 300 \
        --dry-run-first

  - name: Deploy canary (5% traffic)
    run: |
      kubectl set image deployment/api \
        api=registry.io/api:${{ github.sha }}
      kubectl rollout status deployment/api-canary

  - name: Validate canary health
    run: |
      # Check error rate, latency p99, tenant isolation
      node scripts/canary-check.js \
        --duration 300 \
        --error-threshold 0.01 \
        --p99-threshold 500

  - name: Promote to full rollout
    run: |
      kubectl rollout restart deployment/api
      kubectl rollout status deployment/api

The canary step is non-negotiable. We route 5% of traffic to the new version for five minutes and watch the error rate, latency, and -- critically -- run automated checks that verify tenant isolation is still intact. If anything trips, the pipeline stops and alerts the on-call engineer.

Lessons learned

After running this architecture in production for two years across three different products, here's what I wish I'd known from the start:

Test tenant isolation continuously. Not just in CI -- in production. We run a synthetic tenant that makes requests every minute and verifies it can only see its own data. It has caught two RLS misconfigurations before they became incidents.
Plan for the "noisy neighbor" problem early. One tenant running a massive export shouldn't slow down everyone else. Rate limiting per tenant is table stakes. Connection limits per tenant via PgBouncer are more effective.
Tenant-aware logging is not optional. Every log line, every metric, every trace must carry the tenant ID. When something breaks, "which tenant is affected?" is always the first question.
Don't share caches naively. A Redis key of users:list is a data leak waiting to happen. Use tenant:{id}:users:list or use separate Redis databases per tenant.
Build the admin panel on day one. You will need to impersonate tenants, inspect their data, and debug their issues. A tenant-aware admin panel saves more time than any other internal tool.

Key takeaway

Multi-tenancy is an architectural commitment, not a feature you bolt on. The earlier you embrace it, the cleaner your system will be. Start with shared-schema and RLS, keep your migration discipline tight, and invest in observability from the beginning.

If you're building something similar and have questions about the specifics, reach out. I'm always happy to talk architecture over a virtual coffee.

Fred Lackey

Software engineer, father, and occasional writer. Building things with Node.js, Postgres, and too many side projects.