Pattern 30: Scalability Patterns
Intent
Design systems that gracefully scale from 10 to 10 million users using horizontal scaling, stateless services, intelligent caching, database replication, load balancing, and auto-scaling, enabling centralized server-based software to serve unlimited customers without code changes or performance degradation.
Also Known As
- Scale-Out Architecture
- Cloud-Native Scaling
- Elastic Architecture
- Horizontal Scaling Patterns
- Performance at Scale
Problem
Server-based software serves everyone from one system. Without scalability, success becomes failure as user growth brings system down.
The "success disaster" problem:
From Richard's floppy disk era: - Ship 3.5" diskettes to customers - Each customer runs software on THEIR computer - Fast for customer X? X buys faster PC - Slow for customer Y? Y's problem, not yours - Support 10,000 customers? Ship 10,000 diskettes - Each customer's performance independent ✅
Modern server-based era: - Deploy once to YOUR server - All customers access YOUR system - Fast for everyone? YOU need fast server - Slow for anyone? YOUR problem! - Support 10,000 customers? YOUR server serves all - Everyone's performance depends on YOUR system ⚠️
The shift: Your infrastructure became the bottleneck! 😱
The growth crisis:
Month 1: Homeschool co-op launches - 10 families using system - Single server handles it easily - Response time: 50ms ⚡ - Everything perfect!
Month 6: Word spreads - 100 families using system - Same single server - Response time: 200ms 😐 - Getting slower...
Month 12: Goes viral - 1,000 families using system - Same single server (!!!) - Response time: 5,000ms (5 seconds!) 😱 - System crawling, users complaining
Month 18: System collapses - 5,000 families trying to use system - Server completely overwhelmed - Response time: Timeouts, errors - System UNUSABLE! Success destroyed business! 💀
The problem: Success killed the system! More users = slower performance = users leave!
The vertical scaling trap:
CEO: "System is slow! Fix it!"
Developer: "I'll buy a bigger server!"
Current: 4 CPU cores, 16GB RAM, $100/month
Upgrade: 8 CPU cores, 32GB RAM, $300/month
→ System 2× faster! ✅
6 months later...
Current: 8 cores, 32GB RAM, slow again
Upgrade: 16 cores, 64GB RAM, $800/month
→ System 2× faster again! ✅
6 months later...
Current: 16 cores, 64GB RAM, slow again
Upgrade: 32 cores, 128GB RAM, $2,500/month
→ System 2× faster! ✅
6 months later...
Current: 32 cores, 128GB RAM, slow again
Upgrade: 64 cores, 256GB RAM, $8,000/month
→ System 2× faster! ✅
6 months later...
Current: 64 cores, 256GB RAM, slow again
Upgrade: ??? NO BIGGER SERVERS EXIST! 😱
→ HIT THE CEILING! Can't scale further!
Vertical scaling (bigger servers) ALWAYS hits a ceiling! Physics limits how big one machine can be! 🚫
The database bottleneck:
1,000 users → 10,000 queries/second → Database handles it ✅
10,000 users → 100,000 queries/second → Database struggling 😰
100,000 users → 1,000,000 queries/second → Database DEAD 💀
Single database can't handle infinite queries! Becomes bottleneck!
The stateful session problem:
Server stores session in memory:
// User logs in → Server 1 stores session in memory
sessions['user_123'] = {logged_in: true, data: {...}};
// User makes request → MUST go to Server 1 (has session!)
// If request goes to Server 2 → "Who are you?" (no session!)
Sticky sessions = can't distribute load! Users "stuck" to specific server! 😱
What we need: Horizontal scaling
Instead of ONE bigger server (vertical), use MANY smaller servers (horizontal):
Vertical (hits ceiling):
1 server → 2× bigger server → 4× bigger server → 8× bigger → CEILING! 🚫
Horizontal (infinite):
1 server → 2 servers → 4 servers → 8 servers → 100 servers → INFINITE! ✅
Key principles:
1. Stateless services: - Servers don't store sessions in memory - Any server can handle any request - Enables load balancing across all servers
2. Shared state externalized: - Sessions in Redis (shared) - Files in S3 (shared) - Database replicated (shared)
3. Load balancing: - Distribute requests across servers - Add servers = add capacity - Remove servers = reduce cost
4. Caching: - Store frequently-accessed data in memory - Reduce database load 100× - Sub-millisecond response times
5. Database scaling: - Read replicas (scale reads) - Sharding (scale writes) - Separate read/write (CQRS - Pattern 28)
6. Auto-scaling: - Add servers when busy - Remove servers when quiet - Pay only for what you use
Without scalability patterns: - Single server (hits ceiling) - Vertical scaling only (expensive, limited) - Stateful sessions (can't distribute) - No caching (database overwhelmed) - Success = system death
With scalability patterns: - Multiple servers (infinite scale) - Horizontal scaling (add servers easily) - Stateless (perfect load distribution) - Aggressive caching (database protected) - Success = add servers, keep going!
Context
When this pattern applies:
- Server-based / cloud-based software
- User growth expected
- Need to handle traffic spikes
- Want to optimize costs (scale up/down)
- Can't predict exact load
When this pattern may not be needed:
- Desktop software (each user their own machine)
- Internal tool (fixed small user count)
- Prototype / MVP (optimize later)
Forces
Competing concerns:
1. Scale Up vs Scale Out - Scale Up (vertical) = simple, limited - Scale Out (horizontal) = complex, unlimited - Balance: Horizontal for infinite scale
2. Stateful vs Stateless - Stateful = convenient, doesn't scale - Stateless = slightly complex, scales perfectly - Balance: Stateless services, stateful data tier
3. Consistency vs Availability - Strong consistency = slower, single point - Eventual consistency = faster, distributed - Balance: Eventual for most, strong for critical
4. Caching vs Freshness - Heavy caching = fast but potentially stale - No caching = always fresh but slow - Balance: Cache with TTL, invalidate on writes
5. Cost vs Performance - More servers = better performance, higher cost - Fewer servers = lower cost, worse performance - Balance: Auto-scale based on demand
Solution
Build scalable architecture with:
1. Horizontal Scaling (Multiple Servers)
Load balancer distributes requests:
Users → Load Balancer → [Server 1, Server 2, Server 3, ...]
All identical, stateless
Add capacity: Add servers! - 1,000 users → 2 servers - 10,000 users → 5 servers - 100,000 users → 20 servers - 1,000,000 users → 100 servers - Linear scaling! 📈
2. Stateless Application Servers
No session state in server memory:
// BAD: Stateful (doesn't scale)
let sessions = {}; // Stored in server memory
app.post('/login', (req, res) => {
sessions[req.body.user_id] = {logged_in: true};
// Session ONLY on THIS server!
});
// GOOD: Stateless (scales perfectly)
const Redis = require('redis');
const redis = Redis.createClient();
app.post('/login', async (req, res) => {
await redis.set(`session:${req.body.user_id}`, JSON.stringify({logged_in: true}));
// Session in SHARED Redis, ANY server can access!
});
Benefits: - Any server can handle any request - Load balancer can distribute freely - Server crashes? No sessions lost!
3. Aggressive Caching
Cache frequently-accessed data:
// Without cache: Every request hits database
app.get('/family/:id', async (req, res) => {
const family = await db.query('SELECT * FROM families WHERE id = ?', [req.params.id]);
res.json(family);
// 1,000 requests = 1,000 database queries 😰
});
// With cache: Database hit only on miss
const cache = new Map();
app.get('/family/:id', async (req, res) => {
// Check cache first
let family = cache.get(req.params.id);
if (!family) {
// Cache miss: Query database
family = await db.query('SELECT * FROM families WHERE id = ?', [req.params.id]);
cache.set(req.params.id, family);
}
res.json(family);
// 1,000 requests = 1 database query (999 cache hits!) ⚡
});
Cache hit rate = 99% → Database load reduced 100×!
4. Database Read Replicas
Separate reads from writes:
Application Servers
↓ writes
Primary Database (handles writes)
↓ replication
[Replica 1, Replica 2, Replica 3] (handle reads)
↑ reads from application servers
Reads scale horizontally! - 1 primary (writes) - 10 replicas (reads) - Read capacity: 10× primary!
// Write: Goes to primary
await primaryDB.query('INSERT INTO families ...');
// Read: Goes to random replica
const replica = replicas[Math.floor(Math.random() * replicas.length)];
const family = await replica.query('SELECT * FROM families WHERE id = ?', [id]);
5. Database Connection Pooling
Reuse database connections:
// BAD: New connection every query
async function query(sql) {
const connection = await mysql.createConnection({...}); // Expensive!
const result = await connection.query(sql);
await connection.end();
return result;
}
// 1,000 queries = 1,000 new connections = SLOW! 🐌
// GOOD: Connection pool
const pool = mysql.createPool({
connectionLimit: 100, // Max 100 connections
host: 'db.example.com',
user: 'app',
password: '...'
});
async function query(sql) {
const connection = await pool.getConnection(); // Reuse from pool!
const result = await connection.query(sql);
connection.release(); // Return to pool
return result;
}
// 1,000 queries = Reuse 100 connections = FAST! ⚡
6. CDN for Static Assets
Cache static files close to users:
User in Tokyo → CDN Edge (Tokyo) → [CSS, JS, images cached locally]
User in London → CDN Edge (London) → [Same files cached locally]
User in NYC → CDN Edge (NYC) → [Same files cached locally]
No need to hit origin server! ⚡
Benefits: - Sub-10ms response times (vs 200ms to origin) - Reduced server load (90%+ requests served by CDN) - Global performance (fast everywhere)
7. Queue-Based Processing
Async processing for heavy tasks:
// BAD: Synchronous (blocks user)
app.post('/enroll-family', async (req, res) => {
await enrollFamily(req.body); // 100ms
await sendWelcomeEmail(req.body.email); // 500ms
await createCalendarEvents(req.body); // 300ms
await notifyCoordinators(req.body); // 200ms
res.json({success: true});
// User waited 1,100ms! 😰
});
// GOOD: Asynchronous (queue it)
app.post('/enroll-family', async (req, res) => {
await enrollFamily(req.body); // 100ms (critical, must complete)
// Queue non-critical tasks
await queue.enqueue('send-welcome-email', req.body);
await queue.enqueue('create-calendar-events', req.body);
await queue.enqueue('notify-coordinators', req.body);
res.json({success: true});
// User waited 100ms! ⚡ (11× faster!)
});
// Background workers process queue
worker.process('send-welcome-email', async (job) => {
await sendWelcomeEmail(job.data.email);
});
8. Auto-Scaling
Add/remove servers based on load:
// Cloud configuration
autoScaling: {
minServers: 2, // Always at least 2
maxServers: 50, // Never more than 50
scaleUpWhen: {
cpu: "> 70%", // Add server if CPU > 70%
responseTime: "> 500ms"
},
scaleDownWhen: {
cpu: "< 30%", // Remove server if CPU < 30%
responseTime: "< 100ms"
}
}
// Traffic spike: 2 servers → 10 servers (auto!)
// Traffic drops: 10 servers → 3 servers (auto!)
// Pay only for what you use! 💰
9. Circuit Breaker Pattern
Prevent cascade failures:
class CircuitBreaker {
constructor(service) {
this.service = service;
this.failures = 0;
this.state = 'CLOSED'; // CLOSED = normal, OPEN = failing
}
async call(...args) {
if (this.state === 'OPEN') {
// Service is down, fail fast
throw new Error('Circuit breaker OPEN - service unavailable');
}
try {
const result = await this.service(...args);
this.failures = 0; // Success! Reset
return result;
} catch (error) {
this.failures++;
if (this.failures >= 5) {
this.state = 'OPEN'; // Too many failures, stop trying
console.log('Circuit breaker OPEN - service marked unavailable');
}
throw error;
}
}
}
// Usage
const paymentService = new CircuitBreaker(processPayment);
try {
await paymentService.call(familyId, amount);
} catch (error) {
// Service down, degrade gracefully
console.log('Payment processing unavailable, queuing for later');
await queue.enqueue('process-payment', {familyId, amount});
}
Structure
Scalable Architecture Diagram
┌─────────────┐
│ CDN │ (Static files: CSS, JS, images)
└─────────────┘
│
┌─────────────┐
Users ──────→│Load Balancer│
└─────────────┘
│
┌──────────────────┼──────────────────┐
↓ ↓ ↓
┌────────┐ ┌────────┐ ┌────────┐
│Server 1│ │Server 2│ ... │Server N│ (Stateless, auto-scale)
└────────┘ └────────┘ └────────┘
│ │ │
└──────────────────┼──────────────────┘
↓
┌─────────────┐
│ Redis │ (Shared sessions, cache)
└─────────────┘
↓
┌──────────────────┼──────────────────┐
↓ ↓ ↓
┌─────────┐ ┌──────────┐ ┌──────────┐
│Primary │─────→│Replica 1 │ │Replica 2 │ (Database reads)
│Database │ └──────────┘ └──────────┘
└─────────┘
(writes)
↓
┌─────────────┐
│ Message │ (Async processing)
│ Queue │
└─────────────┘
↓
┌─────────────┐
│ Worker │ (Background jobs)
│ Processes │
└─────────────┘
Implementation
Scalable Express Application
const express = require('express');
const Redis = require('redis');
const session = require('express-session');
const RedisStore = require('connect-redis')(session);
const mysql = require('mysql2/promise');
const app = express();
// 1. REDIS FOR SHARED STATE
const redis = Redis.createClient({
host: process.env.REDIS_HOST,
port: 6379
});
// Sessions stored in Redis (not server memory!)
app.use(session({
store: new RedisStore({client: redis}),
secret: process.env.SESSION_SECRET,
resave: false,
saveUninitialized: false
}));
// 2. DATABASE CONNECTION POOL
const dbPool = mysql.createPool({
host: process.env.DB_HOST,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
database: 'homeschool_coop',
connectionLimit: 100,
waitForConnections: true,
queueLimit: 0
});
// Read replicas
const replicaPools = [
mysql.createPool({host: 'replica1.db.example.com', ...config}),
mysql.createPool({host: 'replica2.db.example.com', ...config}),
mysql.createPool({host: 'replica3.db.example.com', ...config})
];
// 3. CACHING LAYER
class Cache {
constructor(redis) {
this.redis = redis;
}
async get(key) {
const cached = await this.redis.get(key);
return cached ? JSON.parse(cached) : null;
}
async set(key, value, ttlSeconds = 300) {
await this.redis.setex(key, ttlSeconds, JSON.stringify(value));
}
async invalidate(key) {
await this.redis.del(key);
}
}
const cache = new Cache(redis);
// 4. HELPER FUNCTIONS
async function queryPrimary(sql, params) {
const connection = await dbPool.getConnection();
try {
const [rows] = await connection.execute(sql, params);
return rows;
} finally {
connection.release();
}
}
async function queryReplica(sql, params) {
// Random replica for read distribution
const replica = replicaPools[Math.floor(Math.random() * replicaPools.length)];
const connection = await replica.getConnection();
try {
const [rows] = await connection.execute(sql, params);
return rows;
} finally {
connection.release();
}
}
// 5. ROUTES WITH CACHING
app.get('/api/family/:id', async (req, res) => {
const familyId = req.params.id;
const cacheKey = `family:${familyId}`;
// Check cache first
let family = await cache.get(cacheKey);
if (family) {
console.log(`Cache HIT: ${cacheKey}`);
return res.json(family);
}
console.log(`Cache MISS: ${cacheKey}`);
// Query replica (read)
family = await queryReplica(
'SELECT * FROM families WHERE family_id = ?',
[familyId]
);
// Cache for 5 minutes
await cache.set(cacheKey, family[0], 300);
res.json(family[0]);
});
app.post('/api/family', async (req, res) => {
// Write: Goes to primary
const result = await queryPrimary(
'INSERT INTO families (family_name, enrollment_date) VALUES (?, ?)',
[req.body.family_name, req.body.enrollment_date]
);
const familyId = result.insertId;
// Invalidate cache (if existed)
await cache.invalidate(`family:${familyId}`);
res.json({success: true, family_id: familyId});
});
app.put('/api/family/:id', async (req, res) => {
const familyId = req.params.id;
// Write: Primary
await queryPrimary(
'UPDATE families SET family_name = ? WHERE family_id = ?',
[req.body.family_name, familyId]
);
// Invalidate cache
await cache.invalidate(`family:${familyId}`);
res.json({success: true});
});
// 6. HEALTH CHECK (for load balancer)
app.get('/health', (req, res) => {
res.json({status: 'healthy', server: process.env.SERVER_ID});
});
// 7. START SERVER
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
console.log(`Server ${process.env.SERVER_ID} running on port ${PORT}`);
console.log('Stateless, ready for horizontal scaling!');
});
Load Balancer Configuration (Nginx)
upstream app_servers {
# Round-robin by default
server app1.internal:3000;
server app2.internal:3000;
server app3.internal:3000;
server app4.internal:3000;
# Health checks
check interval=3000 rise=2 fall=3 timeout=1000;
}
server {
listen 80;
server_name homeschoolcoop.com;
# Load balancing
location /api {
proxy_pass http://app_servers;
# Headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Timeouts
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
# Static files (CDN or local cache)
location /static {
root /var/www/static;
expires 1y;
add_header Cache-Control "public, immutable";
}
}
Auto-Scaling Configuration (AWS)
# AWS Auto Scaling Group
AutoScalingGroup:
MinSize: 2
MaxSize: 50
DesiredCapacity: 5
HealthCheckType: ELB
HealthCheckGracePeriod: 300
TargetTrackingScalingPolicy:
- PolicyName: CPU
TargetValue: 70
ScaleOutCooldown: 60
ScaleInCooldown: 300
- PolicyName: ResponseTime
TargetValue: 500 # milliseconds
ScaleOutCooldown: 60
ScaleInCooldown: 300
Variations
By Scaling Strategy
Vertical Only: - Upgrade to bigger servers - Simple but limited - Hits ceiling
Horizontal Only: - Add more identical servers - Complex but unlimited - Infinite scaling
Hybrid: - Moderate vertical (reasonable server size) - Heavy horizontal (add many) - Best balance
By Caching Layer
Application Cache: - In-memory (Map, LRU cache) - Fast but lost on restart
Shared Cache (Redis): - Centralized, persistent - Shared across servers - Production standard
CDN Cache: - Global edge locations - Static assets only - Fastest for static files
By Database Strategy
Read Replicas: - Scale reads horizontally - Simple, eventual consistency
Sharding: - Split data across databases - Scale writes horizontally - Complex but necessary at scale
CQRS (Pattern 28): - Separate read/write databases - Optimize each independently
Consequences
Benefits
1. Unlimited scaling Add servers indefinitely (no ceiling).
2. Cost optimization Auto-scale: Pay for what you use.
3. High availability Multiple servers: No single point of failure.
4. Global performance CDN: Fast everywhere.
5. Graceful degradation Circuit breakers: Fail gracefully.
Costs
1. Complexity Distributed systems harder than single server.
2. Eventual consistency Read replicas lag slightly.
3. Operational overhead More moving parts to monitor.
4. Development changes Must write stateless code.
Sample Code
Performance comparison:
// Without scaling (single server)
1,000 concurrent users
Response time: 50ms ✅
10,000 concurrent users
Response time: 5,000ms ❌
// With scaling (10 servers, caching, replicas)
1,000 concurrent users
Response time: 20ms ✅ (cache hits!)
10,000 concurrent users
Response time: 25ms ✅ (distributed load!)
100,000 concurrent users
Response time: 30ms ✅ (auto-scaled to 40 servers!)
Known Uses
Netflix - Thousands of servers - Auto-scaling globally - Serves 200M+ users
Facebook - Massive horizontal scaling - Aggressive caching (Memcached) - Database sharding
AWS - Auto-scaling groups - Elastic Load Balancing - CloudFront CDN
Stripe - Stateless API servers - Redis for caching - Database read replicas
Related Patterns
Requires: - Pattern 27: Event Sourcing - scale event store - Pattern 28: CQRS - separate read/write scaling - Pattern 29: Real-Time Processing - scale stream processing
Enables: - Unlimited growth - Global reach - Cost efficiency
References
Academic Foundations
- Kleppmann, Martin (2017). Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O'Reilly. ISBN: 978-1449373320 - Essential scalability reference
- Bondi, André (2000). "Characteristics of Scalability and Their Impact on Performance." Proceedings of the 2nd International Workshop on Software and Performance. https://dl.acm.org/doi/10.1145/350391.350432
- Vogels, Werner (2009). "Eventually Consistent." ACM Queue 6(6). https://queue.acm.org/detail.cfm?id=1466448 - Amazon CTO on consistency
- Gilbert, Seth, and Nancy Lynch (2002). "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services." ACM SIGACT News 33(2). - Formal CAP theorem proof
Scalability Patterns
- Abbott, Martin L., and Michael T. Fisher (2015). The Art of Scalability (2nd ed.). Addison-Wesley. ISBN: 978-0134032801
- Nygard, Michael T. (2018). Release It! (2nd ed.). Pragmatic Bookshelf. ISBN: 978-1680502398 - Production-ready patterns
- Newman, Sam (2021). Building Microservices (2nd ed.). O'Reilly. ISBN: 978-1492034025 - Microservices scalability
- Burns, Brendan (2018). Designing Distributed Systems. O'Reilly. ISBN: 978-1491983645
Cloud Architecture
- AWS Well-Architected Framework: https://aws.amazon.com/architecture/well-architected/ - Scalability pillar
- Azure Architecture Center: https://docs.microsoft.com/en-us/azure/architecture/ - Scalability patterns
- Google Cloud Architecture Framework: https://cloud.google.com/architecture/framework - Best practices
- The Twelve-Factor App: https://12factor.net/ - Scalable app methodology
Load Balancing & Caching
- NGINX: https://www.nginx.com/resources/glossary/load-balancing/ - Load balancing guide
- HAProxy: https://www.haproxy.org/ - High-performance load balancer
- Redis: https://redis.io/ - In-memory data structure store for caching
- Varnish: https://varnish-cache.org/ - HTTP accelerator
- Memcached: https://memcached.org/ - Distributed caching
Database Scalability
- Sharding: https://www.mongodb.com/features/database-sharding-explained - Horizontal partitioning
- Vitess: https://vitess.io/ - Database clustering for MySQL (YouTube scale)
- Citus: https://www.citusdata.com/ - Distributed PostgreSQL
- CockroachDB: https://www.cockroachlabs.com/ - Distributed SQL database
- Cassandra: https://cassandra.apache.org/ - Wide-column distributed database
Related Trilogy Patterns
- Pattern 28: CQRS - Separate read/write scaling
- Pattern 29: Real-Time Processing - Scalable stream processing
- Pattern 30: Scalability Patterns - Horizontal and vertical scaling
- Pattern 32: System Integration - Scalable integration patterns
- Volume 3, Pattern 21: External Data Integration - Scalable API calls
Practical Implementation
- Kubernetes: https://kubernetes.io/ - Container orchestration at scale
- Docker Swarm: https://docs.docker.com/engine/swarm/ - Native Docker clustering
- Nomad: https://www.nomadproject.io/ - Workload orchestration
- Auto Scaling: https://docs.aws.amazon.com/autoscaling/ec2/userguide/what-is-amazon-ec2-auto-scaling.html - AWS EC2 auto-scaling
Monitoring & Observability
- Prometheus: https://prometheus.io/ - Monitoring and alerting toolkit
- Grafana: https://grafana.com/ - Observability and visualization
- Datadog: https://www.datadoghq.com/ - Cloud monitoring platform
- New Relic: https://newrelic.com/ - Application performance monitoring