Multi-Tenant SaaS Architecture: The Database Decision That Will Haunt You
Three months after launching our SaaS product, we had a problem. A good problem: customers were signing up faster than expected. A bad problem: our single-database architecture was becoming a nightmare.
Customer A wanted their data in Europe for GDPR. Customer B needed special performance SLAs. Customer C wanted to audit every database query. And we had built everything assuming one database for everyone.
The rebuild took four months and cost us two customers. Let me save you from making the same mistakes.
What Makes SaaS Architecture Different? Building SaaS isn’t like building a regular web app. You’re not just handling multiple users anymore. You’re handling multiple organizations, each with their own data, their own configs, their own expectations.
The core challenge: isolation. Each tenant’s data must be completely separate from others. A bug in tenant A’s code shouldn’t corrupt tenant B’s data. Tenant C shouldn’t be able to query tenant D’s records, no matter how badly your SQL is written.
You have three main approaches:
Database per tenant: Each tenant gets their own database. Complete isolation, but operational complexity.
Schema per tenant: All tenants share one database, but each has their own schema. Middle ground between isolation and simplicity.
Shared schema: Everyone shares the same tables, differentiated by a tenant ID column. Simple to build, complex to maintain securely.
Each has tradeoffs. Let me walk through what I’ve learned building all three in production.
Database-Per-Tenant: Maximum Isolation This is the nuclear option. Every tenant gets their own database. Complete data isolation, easy to understand, and straightforward to implement.
The Implementation Let’s build a simple multi-tenant system where each customer gets their own Postgres database:
// Tenant provisioning
class TenantProvisioner {
constructor(masterDb) {
this.masterDb = masterDb;
}
async createTenant(tenantData) {
const { name, subdomain, planType } = tenantData;
const dbName = `tenant_${subdomain}`;
// Create database
await this.masterDb.raw(`CREATE DATABASE ${dbName}`);
// Store tenant metadata
const tenant = await this.masterDb('tenants').insert({
name,
subdomain,
database_name: dbName,
plan_type: planType,
created_at: new Date()
}).returning('*');
// Run migrations on new database
await this.runMigrations(dbName);
// Seed initial data
await this.seedTenantData(dbName, tenant[0]);
return tenant[0];
}
async runMigrations(dbName) {
const tenantDb = this.getConnection(dbName);
// Run all migrations for tenant database
await tenantDb.migrate.latest();
}
async seedTenantData(dbName, tenant) {
const tenantDb = this.getConnection(dbName);
// Create initial admin user, default settings, etc.
await tenantDb('users').insert({
email: tenant.admin_email,
role: 'admin',
tenant_id: tenant.id
});
}
getConnection(dbName) {
return knex({
client: 'pg',
connection: {
host: process.env.DB_HOST,
database: dbName,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD
}
});
}
}
Connection Pooling Strategy With database-per-tenant, connection pooling gets interesting. You can’t use a single connection pool for all tenants because each tenant has a different database.
class TenantConnectionPool {
constructor() {
this.pools = new Map();
this.maxPools = 100; // Limit number of active pools
}
getPool(dbName) {
if (!this.pools.has(dbName)) {
// Create new pool if under limit
if (this.pools.size >= this.maxPools) {
// Remove least recently used pool
const lruKey = this.findLRUPool();
this.closePool(lruKey);
}
this.pools.set(dbName, {
pool: this.createPool(dbName),
lastUsed: Date.now()
});
}
// Update last used timestamp
const poolData = this.pools.get(dbName);
poolData.lastUsed = Date.now();
return poolData.pool;
}
createPool(dbName) {
return knex({
client: 'pg',
connection: {
host: process.env.DB_HOST,
database: dbName,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD
},
pool: {
min: 2,
max: 10,
idleTimeoutMillis: 30000
}
});
}
findLRUPool() {
let lruKey = null;
let lruTime = Infinity;
for (const [key, data] of this.pools.entries()) {
if (data.lastUsed < lruTime) {
lruTime = data.lastUsed;
lruKey = key;
}
}
return lruKey;
}
async closePool(dbName) {
const poolData = this.pools.get(dbName);
if (poolData) {
await poolData.pool.destroy();
this.pools.delete(dbName);
}
}
}
// Usage in your app
const connectionPool = new TenantConnectionPool();
app.use(async (req, res, next) => {
// Identify tenant from subdomain or header
const subdomain = req.hostname.split('.')[0];
const tenant = await masterDb('tenants')
.where({ subdomain })
.first();
if (!tenant) {
return res.status(404).json({ error: 'Tenant not found' });
}
// Attach tenant database connection to request
req.tenantDb = connectionPool.getPool(tenant.database_name);
req.tenant = tenant;
next();
});
// Now all routes have access to tenant-specific database
app.get('/api/users', async (req, res) => {
const users = await req.tenantDb('users').select('*');
res.json(users);
});
Schema Migrations at Scale Here’s where database-per-tenant gets painful. You need to run migrations across potentially thousands of databases.
class TenantMigrationRunner {
constructor(masterDb, connectionPool) {
this.masterDb = masterDb;
this.connectionPool = connectionPool;
}
async migrateAll() {
const tenants = await this.masterDb('tenants').select('*');
console.log(`Migrating ${tenants.length} tenant databases...`);
const results = {
success: [],
failed: []
};
// Migrate in batches to avoid overwhelming the database server
const batchSize = 10;
for (let i = 0; i < tenants.length; i += batchSize) {
const batch = tenants.slice(i, i + batchSize);
await Promise.all(
batch.map(async (tenant) => {
try {
await this.migrateTenant(tenant);
results.success.push(tenant.subdomain);
console.log(`✓ Migrated ${tenant.subdomain}`);
} catch (error) {
results.failed.push({
tenant: tenant.subdomain,
error: error.message
});
console.error(`✗ Failed ${tenant.subdomain}:`, error.message);
}
})
);
// Brief pause between batches
if (i + batchSize < tenants.length) {
await this.sleep(1000);
}
}
return results;
}
async migrateTenant(tenant) {
const db = this.connectionPool.getPool(tenant.database_name);
// Run migrations
const [batchNo, log] = await db.migrate.latest();
// Record migration in master database
await this.masterDb('tenant_migrations').insert({
tenant_id: tenant.id,
batch_number: batchNo,
migrated_at: new Date()
});
return { batchNo, log };
}
async rollbackTenant(tenantId) {
const tenant = await this.masterDb('tenants')
.where({ id: tenantId })
.first();
const db = this.connectionPool.getPool(tenant.database_name);
await db.migrate.rollback();
console.log(`Rolled back ${tenant.subdomain}`);
}
sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
// CLI tool for migrations
async function runMigrations() {
const migrator = new TenantMigrationRunner(masterDb, connectionPool);
console.log('Starting tenant migrations...');
const results = await migrator.migrateAll();
console.log('\nMigration Results:');
console.log(`Success: ${results.success.length}`);
console.log(`Failed: ${results.failed.length}`);
if (results.failed.length > 0) {
console.log('\nFailed tenants:');
results.failed.forEach(f => {
console.log(` - ${f.tenant}: ${f.error}`);
});
}
}
Backup and Recovery With database-per-tenant, you can backup each tenant independently. This is actually a huge advantage.
class TenantBackupManager {
async backupTenant(tenantId) {
const tenant = await masterDb('tenants')
.where({ id: tenantId })
.first();
const backupFile = `backups/${tenant.database_name}-${Date.now()}.sql`;
// Using pg_dump for Postgres
await execPromise(
`pg_dump -h ${process.env.DB_HOST} ` +
`-U ${process.env.DB_USER} ` +
`-d ${tenant.database_name} ` +
`-f ${backupFile}`
);
// Upload to S3
await this.uploadToS3(backupFile, tenant);
// Record backup metadata
await masterDb('tenant_backups').insert({
tenant_id: tenant.id,
backup_file: backupFile,
backup_size: await this.getFileSize(backupFile),
created_at: new Date()
});
return backupFile;
}
async restoreTenant(tenantId, backupFile) {
const tenant = await masterDb('tenants')
.where({ id: tenantId })
.first();
// Download backup from S3
const localFile = await this.downloadFromS3(backupFile);
// Drop and recreate database
await masterDb.raw(`DROP DATABASE IF EXISTS ${tenant.database_name}`);
await masterDb.raw(`CREATE DATABASE ${tenant.database_name}`);
// Restore from backup
await execPromise(
`psql -h ${process.env.DB_HOST} ` +
`-U ${process.env.DB_USER} ` +
`-d ${tenant.database_name} ` +
`-f ${localFile}`
);
console.log(`Restored ${tenant.subdomain} from ${backupFile}`);
}
}
Pros and Cons of Database-Per-Tenant
After running this in production, here’s what I’ve learned:
The good stuff: Complete data isolation. There’s literally no way for tenant A to access tenant B’s data. Security audits become much simpler. Easy tenant-specific customization. Need a custom index for a big customer? Create it in their database only.
Simple backup and recovery. Backup one tenant without affecting others. Restore a tenant’s data to a specific point in time independently. Easy to meet data residency requirements. Put tenant databases in specific regions for compliance. Performance isolation. One tenant’s terrible queries don’t slow down everyone else.
The painful parts: Operational complexity explodes. Managing 1000 databases is not fun. Monitoring, backups, migrations all become 1000x harder. Schema migrations take forever. Rolling out a change to 1000 databases can take hours.
Cost scales linearly with tenants. Each database needs resources, even if many tenants are small and inactive. Connection pooling gets complicated. You can’t maintain connections to all databases simultaneously.
Schema-Per-Tenant: The Middle Ground This is my preferred approach for most SaaS products. All tenants share one physical database, but each gets their own schema (namespace).
Implementation In Postgres, schemas are like folders for tables. Each tenant gets their own folder:
class SchemaPerTenantProvisioner {
constructor(db) {
this.db = db;
}
async createTenant(tenantData) {
const { name, subdomain, planType } = tenantData;
const schemaName = `tenant_${subdomain}`;
// Create schema
await this.db.raw(`CREATE SCHEMA ${schemaName}`);
// Set search path for migrations
await this.db.raw(`SET search_path TO ${schemaName}`);
// Run migrations in tenant schema
await this.runMigrationsForSchema(schemaName);
// Store tenant metadata in public schema
const tenant = await this.db('public.tenants').insert({
name,
subdomain,
schema_name: schemaName,
plan_type: planType,
created_at: new Date()
}).returning('*');
// Seed initial data
await this.seedTenantSchema(schemaName, tenant[0]);
return tenant[0];
}
async runMigrationsForSchema(schemaName) {
// Set search path before running migrations
await this.db.raw(`SET search_path TO ${schemaName}`);
// Create all tables in tenant schema
await this.db.schema.withSchema(schemaName).createTable('users', (table) => {
table.increments('id');
table.string('email').notNullable();
table.string('role');
table.timestamps(true, true);
});
await this.db.schema.withSchema(schemaName).createTable('projects', (table) => {
table.increments('id');
table.string('name').notNullable();
table.integer('user_id').references('id').inTable(`${schemaName}.users`);
table.timestamps(true, true);
});
// More tables as needed...
}
async seedTenantSchema(schemaName, tenant) {
await this.db(`${schemaName}.users`).insert({
email: tenant.admin_email,
role: 'admin'
});
}
}
Middleware for Schema Switching The magic happens in middleware that sets the correct schema based on the tenant:
app.use(async (req, res, next) => {
try {
// Identify tenant
const subdomain = req.hostname.split('.')[0];
const tenant = await db('public.tenants')
.where({ subdomain })
.first();
if (!tenant) {
return res.status(404).json({ error: 'Tenant not found' });
}
// Set search path for this request
await db.raw(`SET search_path TO ${tenant.schema_name}, public`);
req.tenant = tenant;
req.schemaName = tenant.schema_name;
next();
} catch (error) {
console.error('Tenant middleware error:', error);
res.status(500).json({ error: 'Internal server error' });
}
});
// Now queries automatically use the correct schema
app.get('/api/users', async (req, res) => {
// This query runs against the tenant's schema
const users = await db('users').select('*');
res.json(users);
});
Handling Schema Migrations Schema-per-tenant migrations are much simpler than database-per-tenant:
class SchemaM igrationRunner {
async migrateAll() {
const tenants = await db('public.tenants').select('*');
console.log(`Migrating ${tenants.length} schemas...`);
for (const tenant of tenants) {
try {
await db.raw(`SET search_path TO ${tenant.schema_name}`);
// Run your migration code
await this.runMigration(tenant.schema_name);
console.log(`✓ Migrated ${tenant.subdomain}`);
} catch (error) {
console.error(`✗ Failed ${tenant.subdomain}:`, error.message);
// Continue with other tenants
}
}
}
async runMigration(schemaName) {
// Example: Add new column
await db.schema.withSchema(schemaName).table('users', (table) => {
table.string('phone_number');
});
}
}
Pros and Cons of Schema-Per-Tenant The good stuff: Good data isolation. Each tenant’s data is in a separate schema. Accidental cross-tenant queries are much less likely.
Simpler operations than database-per-tenant. One database to backup, monitor, and maintain.
Reasonable performance isolation. Postgres can optimize each schema independently.
Migrations are faster. One database to update instead of thousands. Better resource utilization. Inactive tenants don’t waste database connections.
The challenges: Schema-level isolation isn’t perfect. A badly written query can still affect all tenants.
Connection pooling still requires schema switching. Every query needs the right search path set. Some database tools don’t handle schemas well. Backup/restore can be trickier. Large tenants can still affect small ones. Not as much isolation as database-per-tenant.
Shared Schema: Maximum Simplicity, Maximum Risk This is the simplest approach: everyone shares the same tables, differentiated by a tenant_id column.
CREATE TABLE users ( id SERIAL PRIMARY KEY, tenant_id INTEGER NOT NULL, email VARCHAR(255) NOT NULL, name VARCHAR(255), created_at TIMESTAMP DEFAULT NOW() ); CREATE INDEX idx_users_tenant ON users(tenant_id); CREATE TABLE projects ( id SERIAL PRIMARY KEY, tenant_id INTEGER NOT NULL, name VARCHAR(255) NOT NULL, created_at TIMESTAMP DEFAULT NOW() ); CREATE INDEX idx_projects_tenant ON projects(tenant_id);
Implementing Tenant Isolation The critical part: every single query must filter by tenant_id. Miss it once, and you've got a data leak.
// Base query builder that enforces tenant filtering
class TenantAwareQuery {
constructor(db, tenantId) {
this.db = db;
this.tenantId = tenantId;
}
table(tableName) {
// Automatically add tenant_id filter
return this.db(tableName).where({ tenant_id: this.tenantId });
}
insert(tableName, data) {
// Automatically inject tenant_id
return this.db(tableName).insert({
...data,
tenant_id: this.tenantId
});
}
update(tableName, data) {
// Ensure updates are scoped to tenant
return this.db(tableName)
.where({ tenant_id: this.tenantId })
.update(data);
}
delete(tableName, conditions) {
// Ensure deletes are scoped to tenant
return this.db(tableName)
.where({ tenant_id: this.tenantId, ...conditions })
.delete();
}
}
// Middleware
app.use(async (req, res, next) => {
const subdomain = req.hostname.split('.')[0];
const tenant = await db('tenants')
.where({ subdomain })
.first();
if (!tenant) {
return res.status(404).json({ error: 'Tenant not found' });
}
// Give each request a tenant-aware query builder
req.tenant = tenant;
req.tenantDb = new TenantAwareQuery(db, tenant.id);
next();
});
// Usage in routes
app.get('/api/users', async (req, res) => {
// tenantDb automatically filters by tenant_id
const users = await req.tenantDb.table('users').select('*');
res.json(users);
});
app.post('/api/users', async (req, res) => {
// tenantDb automatically adds tenant_id
const [user] = await req.tenantDb.insert('users', req.body);
res.json(user);
});
Database-Level Enforcement with Row-Level Security Don’t rely solely on application code. Use database-level security:
-- Enable row-level security
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
-- Create policy that filters by tenant_id
CREATE POLICY tenant_isolation_policy ON users
USING (tenant_id = current_setting('app.current_tenant')::INTEGER);
CREATE POLICY tenant_isolation_policy ON projects
USING (tenant_id = current_setting('app.current_tenant')::INTEGER);
Set the tenant ID at the connection level:
app.use(async (req, res, next) => {
const tenant = await db('tenants').where({ subdomain }).first();
// Set tenant ID for row-level security
await db.raw(`SET app.current_tenant = '${tenant.id}'`);
req.tenant = tenant;
next();
});
Now even if your application code forgets to filter by tenant_id, the database blocks cross-tenant access. Pros and Cons of Shared Schema
The good stuff: Simplest to implement. No schema switching, no multiple databases. Best resource utilization. All tenants share the same tables and indexes. Easiest migrations. One schema change applies to everyone. Best for analytics. Easy to run queries across all tenants. Cheapest option. Minimal infrastructure overhead.
The scary parts: One mistake leaks data. Forget WHERE tenant_id = ? and you've exposed everything.
No performance isolation. One tenant’s bad queries affect everyone. Harder to meet compliance requirements. Data residency rules are difficult to satisfy.
Difficult to customize per tenant. Can’t add indexes or columns for specific tenants. Backup and restore affects all tenants. Can’t restore one tenant independently. Making the Decision
After building all three approaches, here’s my decision framework: Use database-per-tenant when:
- You have enterprise customers with strict data isolation requirements
- Tenants need different data locations for compliance
- You need to offer tenant-specific database optimizations
- You have a small number of high-value customers (< 100 tenants)
- Operational complexity isn’t a concern
Use schema-per-tenant when:
- You need good isolation without extreme operational overhead
- You expect hundreds to thousands of tenants
- Most tenants have similar resource requirements
- You want the flexibility to customize per tenant
- You’re building a typical B2B SaaS product
Use shared schema when:
- You’re just starting and need to validate product-market fit
- You have thousands of small, similar tenants
- Data isolation requirements are minimal
- You need the simplest possible implementation
- You’re building a B2C product with simple multi-tenancy needs
Most products should start with schema-per-tenant. It’s the sweet spot of isolation, simplicity, and scalability.
Hybrid Approaches Real products often mix approaches. Here’s what I’ve seen work: Shared schema for small tenants, database-per-tenant for enterprise:
class HybridTenantRouter {
async getTenantConnection(tenant) {
if (tenant.plan_type === 'enterprise') {
// Enterprise gets dedicated database
return this.getIsolatedDatabase(tenant.database_name);
} else {
// Small tenants share database
return this.getSharedDatabase(tenant.id);
}
}
}
Schema-per-tenant with overflow to new databases:
class ScalableSchemaRouter {
async getTenantSchema(tenant) {
// First 1000 tenants in database 1
// Next 1000 in database 2, etc.
const dbNumber = Math.floor(tenant.id / 1000) + 1;
const db = this.getDatabase(dbNumber);
await db.raw(`SET search_path TO ${tenant.schema_name}`);
return db;
}
}
Migration Between Approaches You might start with shared schema and later need to migrate to schema-per-tenant or database-per-tenant. Plan for this from the start:
class TenantMigrator {
async migrateToIsolatedSchema(tenantId) {
const tenant = await db('tenants').where({ id: tenantId }).first();
const newSchemaName = `tenant_${tenant.subdomain}`;
// Create new schema
await db.raw(`CREATE SCHEMA ${newSchemaName}`);
// Copy all tables for this tenant
const tables = ['users', 'projects', 'tasks'];
for (const table of tables) {
await db.raw(`
CREATE TABLE ${newSchemaName}.${table}
(LIKE public.${table} INCLUDING ALL)
`);
await db.raw(`
INSERT INTO ${newSchemaName}.${table}
SELECT * FROM public.${table}
WHERE tenant_id = ?
`, [tenantId]);
}
// Update tenant record
await db('tenants')
.where({ id: tenantId })
.update({
schema_name: newSchemaName,
uses_isolated_schema: true
});
// Clean up old data (after verification)
// await this.cleanupOldData(tenantId);
}
}
Monitoring and Observability Regardless of approach, you need visibility into tenant health:
class TenantMonitoring {
async getTenantMetrics(tenantId) {
return {
databaseSize: await this.getDatabaseSize(tenantId),
tableRowCounts: await this.getRowCounts(tenantId),
queryPerformance: await this.getSlowQueries(tenantId),
connectionCount: await this.getConnectionCount(tenantId)
};
}
async getDatabaseSize(tenantId) {
const tenant = await db('tenants').where({ id: tenantId }).first();
// For database-per-tenant
if (tenant.database_name) {
const result = await db.raw(`
SELECT pg_database_size('${tenant.database_name}') as size
`);
return result.rows[0].size;
}
// For schema-per-tenant
if (tenant.schema_name) {
const result = await db.raw(`
SELECT
sum(pg_total_relation_size(quote_ident(schemaname) || '.' || quote_ident(tablename)))
FROM pg_tables
WHERE schemaname = '${tenant.schema_name}'
`);
return result.rows[0].sum;
}
}
}
The architecture you choose will shape your SaaS product for years. Choose carefully, but don’t overthink it. You can always migrate later if needed. What multi-tenancy approach are you using? Any horror stories or success stories to share? Drop a comment below.
Read the full article here: https://medium.com/@coders.stop/multi-tenant-saas-architecture-the-database-decision-that-will-haunt-you-09c22dd1f235