Chapter 14: Migration Strategies
Introduction: The Make-or-Break Moment
"Experience with migrations and data conversions from legacy systems reveals a critical truth: this is the part of implementation most prone to failure unless done properly and thoughtfully. Most failed projects don't fail because of bad code - they fail because of bad migration strategy."
The hard truth about migrations:
- 70% of IT projects fail during migration, not development
- Average migration takes 2-3x longer than planned
- Data quality issues surface only during migration
- User resistance kills good systems
- Rollback is harder than you think
But migrations don't have to fail. With the right strategy, you can migrate legacy systems to pattern-based systems safely, predictably, and successfully.
This chapter covers: 1. Why migrations fail (learn from others' mistakes) 2. The Strangler Fig pattern (gradual replacement) 3. Data migration strategies (ETL with validation) 4. Parallel running (reduce risk) 5. Cutover strategies (minimize downtime) 6. Rollback planning (when things go wrong) 7. Change management (the human side)
Section 1: Why Migrations Fail
Learn from these common mistakes. Every one of these has killed real projects:
Failure Mode 1: "Big Bang" Cutover
What they did: Spent 18 months building new system in secret. On cutover weekend, turned off old system, turned on new system. Monday morning, chaos.
What went wrong: - Users had no training on new system - Edge cases only old system handled - Data migration had errors (discovered live) - No rollback plan (old system turned off) - Support team overwhelmed
Real example (government agency): County clerk's office replaced 25-year-old permit system with modern web app. Big bang cutover on Friday night. Monday morning: - 40% of permit applications failed (missing data fields) - Inspectors couldn't access old permits (legacy data not migrated) - Staff didn't know how to use new system (no training) - Public couldn't submit permits (system down) - Old system tapes in warehouse (3-day recovery)
Cost: $2.3M project scrapped, old system restored, CIO fired.
Lesson: Never big bang. Always gradual.
Failure Mode 2: Underestimating Data Quality
What they did: Assumed legacy data was clean. Started migration, discovered 30 years of data quality issues.
What went wrong: - Duplicate records (same customer entered 5 times) - Inconsistent formats (phone: 555-1234, (555) 123-4567, 555.123.4567) - Missing required fields (new system requires email, old had none) - Invalid references (orders referencing deleted customers) - Encoding issues (special characters corrupted)
Real example (legal firm): Law firm migrating 50,000 client records from DOS-based system to modern case management. Discovered: - 12,000 duplicate client records (23%) - 8,000 clients with no address (16%) - 15,000 cases with no closing date (30%) - 200+ different spellings of "plaintiff" in notes - Client SSNs stored in phone number fields (!)
Timeline impact: - Planned: 3 months - Actual: 18 months (data cleanup) - Extra cost: $400,000
Lesson: Assess data quality BEFORE migration. Plan for cleanup.
Failure Mode 3: Ignoring User Resistance
What they did: Built technically perfect system. Users refused to use it.
What went wrong: - Users comfortable with old system (20+ years) - New system had learning curve - No one asked users what they needed - Training was optional (no one attended) - Old system still available (users kept using it)
Real example (healthcare clinic): Clinic replaced paper charts with EMR. System was excellent, but: - Doctors continued using paper, transcribed later - Nurses printed EMR data, wrote on printouts - Front desk kept paper schedule alongside EMR - Data in both systems, constantly out of sync
Result: After 2 years, still using paper primarily. $500k EMR barely used.
Lesson: Change management is more important than code quality.
Failure Mode 4: No Rollback Plan
What they did: "We're committed. No going back." Then discovered critical bug in production.
What went wrong: - Old system already decommissioned (database server wiped) - Backup tapes in offsite storage (3-day retrieval) - Users trained on new system only (forgot old workflows) - Can't roll forward, can't roll back = stuck
Real example (e-commerce company): Online retailer migrated order management system. Week 2 in production: - Bug in shipping calculation (overcharged customers $80k) - Old system shut down (server reused) - Had to manually refund 4,000 orders - Lost 2,000 customers (bad reviews)
Cost: $250k in refunds + reputation damage.
Lesson: Always have rollback plan. Keep old system running in parallel.
Failure Mode 5: Underestimating Complexity
What they did: "It's just CRUD. How hard can it be?" Discovered 20 years of undocumented business logic.
What went wrong: - Old system had hundreds of edge cases - Business rules buried in code (no documentation) - Integrations no one knew existed - Reports depended on specific data quirks - "Magic" calculations no one could explain
Real example (financial services): Bank migrating loan origination system. Discovered: - 47 different loan types (documented: 12) - Interest calculation varied by state, product, date - 200+ undocumented business rules in stored procedures - Credit bureau integration used custom SOAP encoding - Executive reports pulled data from 7 different systems
Original estimate: 6 months, $500k Actual: 24 months, $2.8M
Lesson: Legacy systems are more complex than they look. Triple your estimate.
Section 2: The Strangler Fig Pattern
The safest migration strategy. Named after fig trees that gradually grow around host trees until the host can be removed.
How It Works
Phase 1: Run in parallel - New system handles new features - Old system keeps handling existing functionality - Both systems active simultaneously
Phase 2: Gradual migration - Migrate one feature at a time - Test thoroughly before moving next feature - Users learn new system gradually
Phase 3: Complete strangling - Eventually all features in new system - Old system turned off - Migration complete
Example: Permit System Migration
Legacy system: DOS-based permit application from 1995. 30 years of data. 50 users. 10,000 permits/year.
Strangler fig approach:
Month 1-2: Infrastructure
┌─────────────────────────────────────┐
│ Old DOS System │
│ (Still handles everything) │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Database Migration Layer │
│ (Replicate data to new DB nightly) │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ New System (empty) │
│ (Read-only access to data) │
└─────────────────────────────────────┘
Users: Still using old system 100% Data: Copying to new database nightly Risk: Low (new system not critical yet)
Month 3-4: First Feature Migration
┌─────────────────────────────────────┐
│ Old DOS System │
│ - Permit applications (still) │
│ - Inspections (still) │
│ - Reports (still) │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ New System │
│ - Permit lookup (NEW!) │
│ Public can search permits online │
└─────────────────────────────────────┘
Users: Public uses new system for lookup, staff uses old system for data entry Benefits: Public can find permits online (never could before) Risk: Low (read-only, no data changes)
Month 5-7: Second Feature Migration
┌─────────────────────────────────────┐
│ Old DOS System │
│ - Permit applications (still) │
│ - Inspections (still) │
│ - Reports (migrating...) │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ New System │
│ - Permit lookup │
│ - Reports (NEW!) │
│ Generate all reports online │
└─────────────────────────────────────┘
Users: Staff generates reports in new system, everything else in old system Benefits: Beautiful reports (vs dot matrix printouts) Risk: Low (reports don't change data)
Month 8-10: Critical Feature Migration
┌─────────────────────────────────────┐
│ Old DOS System │
│ - Inspections (still) │
│ - Legacy permit lookup │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ New System │
│ - Permit applications (NEW!) │
│ Online application with │
│ validation, zoning checks │
│ - Permit lookup │
│ - Reports │
└─────────────────────────────────────┘
Users: New permits in new system, old permits still in old system Benefits: Real-time validation, no paper forms Risk: Medium (creating new data)
Month 11-12: Final Migration
┌─────────────────────────────────────┐
│ Old DOS System (read-only) │
│ Historical data only │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ New System │
│ - Permit applications │
│ - Inspections (NEW!) │
│ - Permit lookup │
│ - Reports │
│ - All features migrated! │
└─────────────────────────────────────┘
Users: 100% on new system for daily work Old system: Available for historical lookups only Risk: Low (gradual migration de-risked each step)
Month 13+: Decommission - Old system data fully migrated - Old system shut down - Hardware repurposed
Why Strangler Fig Works
✅ Users adapt gradually - Learn one feature at a time ✅ Low risk - Each step tested before next ✅ Reversible - Can pause migration if issues arise ✅ Builds confidence - Early wins create momentum ✅ Maintains business continuity - Never fully down
The pattern is clear: Successful migrations consistently use strangler fig variations. Failed ones consistently try big bang.
Section 3: Data Migration Strategies
Data migration is where projects die. Here's how to do it right.
The Five-Stage Data Migration Pipeline
Stage 1: Extract (Get data out of legacy system) Stage 2: Profile (Understand data quality) Stage 3: Transform (Clean and convert data) Stage 4: Validate (Verify correctness) Stage 5: Load (Insert into new system)
Stage 1: Extract
Goal: Get data out of legacy system in usable format.
Challenges: - Legacy systems often have no export function - Database may be proprietary format (dBase, FoxPro) - Some data only in printed reports - Special characters may corrupt
Extraction methods:
Method 1: Database dump (best)
# If you have database access
mysqldump --single-transaction legacy_db > legacy_data.sql
# Or for specific tables
pg_dump -t permits -t inspections legacy_db > data.sql
Method 2: Export via legacy application
# Many legacy systems can export to CSV
# Use the application's export feature if available
Old System → File → Export → CSV
Method 3: Database replication (for strangler fig)
// Set up real-time replication
const replicator = new DatabaseReplicator({
source: 'legacy-db.company.local:3306',
target: 'new-db.company.local:5432',
tables: ['permits', 'inspections', 'customers'],
schedule: 'every 4 hours' // or real-time
});
replicator.start();
Method 4: Screen scraping (last resort)
# If no database access, scrape the UI
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('http://legacy-system/permits')
permits = []
for row in driver.find_elements_by_tag_name('tr'):
permit = {
'id': row.find_element_by_class('permit-id').text,
'address': row.find_element_by_class('address').text,
# ...
}
permits.append(permit)
Critical tip: Always extract more than you think you need. You can't go back to extract later.
Stage 2: Profile
Goal: Understand data quality before attempting transformation.
Profile everything: - Record counts - Null/empty values - Unique values per field - Data type distributions - Duplicate records - Referential integrity
Example profiling script:
import pandas as pd
# Load extracted data
df = pd.read_csv('legacy_permits.csv')
# Profile report
print(f"Total records: {len(df)}")
print(f"Columns: {df.columns.tolist()}")
print("\nNull values per column:")
print(df.isnull().sum())
print("\nDuplicate records: {df.duplicated().sum()}")
print("\nData types:")
print(df.dtypes)
# Field-specific analysis
print("\nPermit status distribution:")
print(df['status'].value_counts())
print("\nInvalid zip codes:")
invalid_zips = df[~df['zip_code'].str.match(r'^\d{5}(-\d{4})?$')]
print(f"Count: {len(invalid_zips)}")
# Referential integrity
print("\nOrphaned records (customer_id not in customers table):")
orphans = df[~df['customer_id'].isin(customers_df['id'])]
print(f"Count: {len(orphans)}")
Output example:
Total records: 48,392
Columns: ['id', 'permit_type', 'address', 'customer_id', 'status', 'submitted_date']
Null values per column:
id 0
permit_type 47
address 892
customer_id 1,203
status 0
submitted_date 4,281
Duplicate records: 1,847
Invalid zip codes: 3,472
Orphaned records: 2,108
Action: Create data quality report. Show to stakeholders. Adjust timeline based on cleanup needed.
Stage 3: Transform
Goal: Clean, standardize, and convert data to new schema.
Common transformations:
1. Deduplication
# Find duplicates based on key fields
duplicates = df[df.duplicated(subset=['customer_name', 'address'], keep=False)]
# Strategy 1: Keep most recent
df = df.sort_values('created_date').drop_duplicates(
subset=['customer_name', 'address'],
keep='last'
)
# Strategy 2: Merge records
def merge_duplicates(group):
# Take first non-null value for each field
return group.apply(lambda x: x.dropna().iloc[0] if not x.dropna().empty else None)
df = df.groupby(['customer_name', 'address']).apply(merge_duplicates)
2. Standardization
# Phone numbers: (555) 123-4567 → 5551234567
df['phone'] = df['phone'].str.replace(r'[^\d]', '', regex=True)
# Addresses: Upper case, trim whitespace
df['address'] = df['address'].str.strip().str.upper()
# Dates: Various formats → ISO 8601
df['date'] = pd.to_datetime(df['date'], errors='coerce')
# Status codes: Y/N → true/false
df['active'] = df['active'].map({'Y': True, 'N': False, '': False})
3. Enrichment
# Add missing data from external sources
import requests
def validate_address(address):
response = requests.post('https://api.usps.com/validate', json={
'address': address
})
return response.json()['corrected_address']
# Apply to addresses with missing zip codes
df.loc[df['zip_code'].isnull(), 'address'] = df[df['zip_code'].isnull()]['address'].apply(validate_address)
4. Schema mapping
# Old schema → New schema
mapping = {
'cust_id': 'customer_id',
'cust_nm': 'customer_name',
'addr1': 'street_address',
'addr2': 'address_line_2',
'city': 'city',
'st': 'state',
'zip': 'postal_code'
}
new_df = df.rename(columns=mapping)
# Add required fields not in legacy system
new_df['created_by'] = 'migration_script'
new_df['created_at'] = datetime.now()
Critical warning: Transformation is where you'll spend 80% of migration time. Budget accordingly.
Stage 4: Validate
Goal: Verify transformed data is correct before loading.
Validation checks:
1. Record count reconciliation
legacy_count = len(legacy_df)
transformed_count = len(transformed_df)
expected_duplicates_removed = 1847
assert transformed_count == legacy_count - expected_duplicates_removed, \
f"Record count mismatch! Expected {legacy_count - expected_duplicates_removed}, got {transformed_count}"
2. Field-level validation
# All required fields present
required_fields = ['customer_id', 'address', 'status']
for field in required_fields:
null_count = transformed_df[field].isnull().sum()
assert null_count == 0, f"Field {field} has {null_count} null values"
# Data types correct
assert transformed_df['customer_id'].dtype == 'int64'
assert transformed_df['submitted_date'].dtype == 'datetime64[ns]'
# Value ranges valid
assert (transformed_df['permit_fee'] >= 0).all(), "Negative fees found"
assert transformed_df['state'].isin(US_STATES).all(), "Invalid state codes found"
3. Business rule validation
# Approved permits must have approval date
approved = transformed_df[transformed_df['status'] == 'approved']
assert approved['approval_date'].notnull().all(), \
"Approved permits missing approval dates"
# Customer IDs must exist in customers table
customer_ids = set(customers_df['id'])
invalid_refs = transformed_df[~transformed_df['customer_id'].isin(customer_ids)]
assert len(invalid_refs) == 0, \
f"Found {len(invalid_refs)} permits with invalid customer_id"
4. Sample validation (manual)
# Export sample for manual review
sample = transformed_df.sample(n=100)
sample.to_csv('migration_sample.csv')
print("Review migration_sample.csv manually")
print("Check that:")
print("- Addresses are correct")
print("- Phone numbers formatted properly")
print("- Dates make sense")
print("- Status values appropriate")
Validation report:
Data Validation Report - Legacy Permits Migration
Generated: 2025-12-28
Record Counts:
- Legacy system: 48,392 records
- After deduplication: 46,545 records (-1,847)
- After cleanup: 46,545 records
- Ready to load: 46,545 records ✓
Field Validation:
✓ All required fields present
✓ No null values in required fields
✓ Data types correct
✓ Value ranges valid
Business Rules:
✓ Approved permits have approval dates
✓ All customer references valid
✓ Dates within reasonable range
⚠ 23 permits with $0 fee (flagged for review)
Sample Validation:
✓ 100 random samples reviewed manually
✓ No issues found
Ready for loading: YES
Estimated load time: 2-3 hours
Stage 5: Load
Goal: Insert validated data into new system.
Loading strategies:
Strategy 1: Bulk insert (fastest)
import psycopg2
from psycopg2.extras import execute_batch
conn = psycopg2.connect("postgresql://localhost/new_db")
cursor = conn.cursor()
# Prepare data
records = transformed_df.to_records(index=False)
data = [tuple(x) for x in records]
# Bulk insert
sql = """
INSERT INTO permits (customer_id, address, status, submitted_date)
VALUES (%s, %s, %s, %s)
"""
execute_batch(cursor, sql, data, page_size=1000)
conn.commit()
print(f"Loaded {len(data)} records")
Strategy 2: API-based (safer)
import requests
api_url = 'https://new-system.com/api/permits'
api_key = os.getenv('API_KEY')
loaded = 0
errors = []
for _, row in transformed_df.iterrows():
try:
response = requests.post(api_url,
headers={'Authorization': f'Bearer {api_key}'},
json={
'customer_id': int(row['customer_id']),
'address': row['address'],
'status': row['status'],
'submitted_date': row['submitted_date'].isoformat()
}
)
response.raise_for_status()
loaded += 1
if loaded % 1000 == 0:
print(f"Loaded {loaded} / {len(transformed_df)} records")
except Exception as e:
errors.append({
'record': row.to_dict(),
'error': str(e)
})
print(f"Successfully loaded: {loaded}")
print(f"Errors: {len(errors)}")
# Save errors for review
if errors:
pd.DataFrame(errors).to_csv('load_errors.csv')
Strategy 3: Incremental (for large datasets)
# Load in chunks, commit after each chunk
chunk_size = 10000
for i in range(0, len(transformed_df), chunk_size):
chunk = transformed_df[i:i+chunk_size]
# Load chunk
chunk.to_sql('permits', conn, if_exists='append', index=False)
# Commit
conn.commit()
print(f"Loaded chunk {i//chunk_size + 1}: {len(chunk)} records")
# Pause between chunks (be nice to database)
time.sleep(1)
Post-load validation:
# Verify counts match
legacy_count = 46545
new_count = conn.execute("SELECT COUNT(*) FROM permits").fetchone()[0]
assert new_count == legacy_count, \
f"Count mismatch! Expected {legacy_count}, got {new_count}"
# Spot check sample records
sample_ids = [12, 156, 2847, 19203, 45012]
for id in sample_ids:
legacy_record = legacy_df[legacy_df['id'] == id].iloc[0]
new_record = pd.read_sql(f"SELECT * FROM permits WHERE legacy_id = {id}", conn).iloc[0]
# Compare key fields
assert legacy_record['customer_id'] == new_record['customer_id']
assert legacy_record['address'] == new_record['address']
# ...
print("✓ Post-load validation passed")
Section 4: Parallel Running
Running old and new systems side-by-side reduces risk.
Parallel Running Strategy
Phase 1: New system shadow mode (weeks 1-4) - Old system: Production (users work here) - New system: Shadow (processes data but doesn't affect users) - Compare outputs between systems
Phase 2: New system read-only (weeks 5-8) - Old system: Still production for writes - New system: Users can view data (read-only) - Builds user confidence
Phase 3: Gradual cutover (weeks 9-12) - New system: Handles some transactions - Old system: Handles remaining transactions - Monitor both systems
Phase 4: Full cutover (week 13) - New system: Primary - Old system: Backup/archive only
Example: Invoice System Parallel Running
Week 1-4: Shadow mode
// Every invoice entered in old system also creates record in new system
async function processInvoice(invoiceData) {
// Process in old system (production)
const oldSystemResult = await oldSystem.createInvoice(invoiceData);
try {
// Also process in new system (shadow)
const newSystemResult = await newSystem.createInvoice(invoiceData);
// Compare results
const comparison = compareInvoices(oldSystemResult, newSystemResult);
if (!comparison.match) {
logger.warn('Invoice mismatch detected', {
invoiceId: invoiceData.id,
differences: comparison.differences
});
// Alert developers
await alerting.send({
type: 'shadow_mode_mismatch',
details: comparison
});
}
} catch (error) {
// New system errors don't affect production
logger.error('New system error (shadow mode)', error);
}
// Return old system result (production)
return oldSystemResult;
}
function compareInvoices(old, new) {
const differences = [];
if (old.total !== new.total) {
differences.push(`Total: ${old.total} vs ${new.total}`);
}
if (old.tax !== new.tax) {
differences.push(`Tax: ${old.tax} vs ${new.tax}`);
}
return {
match: differences.length === 0,
differences
};
}
Week 5-8: Read-only mode
// Users can view invoices in new system
app.get('/invoices/:id', async (req, res) => {
const invoiceId = req.params.id;
// Fetch from both systems
const [oldInvoice, newInvoice] = await Promise.all([
oldSystem.getInvoice(invoiceId),
newSystem.getInvoice(invoiceId)
]);
// Show comparison view
res.render('invoice-compare', {
old: oldInvoice,
new: newInvoice,
differences: compareInvoices(oldInvoice, newInvoice).differences
});
});
Week 9-12: Gradual cutover
// Route some invoices to new system based on criteria
async function processInvoice(invoiceData) {
// Feature flag: gradually increase percentage
const newSystemPercentage = featureFlags.get('new_system_cutover'); // Start at 10%, increase to 100%
const useNewSystem = Math.random() < (newSystemPercentage / 100);
if (useNewSystem) {
try {
const result = await newSystem.createInvoice(invoiceData);
logger.info('Invoice processed in new system', { invoiceId: invoiceData.id });
return result;
} catch (error) {
// Fallback to old system on error
logger.error('New system failed, falling back to old', error);
return await oldSystem.createInvoice(invoiceData);
}
} else {
return await oldSystem.createInvoice(invoiceData);
}
}
Benefits of parallel running: - ✅ Catch discrepancies before cutover - ✅ Users build confidence in new system - ✅ Can pause if issues arise - ✅ Gradual load increase (not sudden spike) - ✅ Rollback easy (flip feature flag)
Cost: Extra infrastructure (running both systems)
Worth it? Absolutely. Parallel running has saved countless migrations from disaster.
Section 5: Cutover Strategies
Eventually you need to cut over completely. Here are proven approaches:
Strategy 1: Weekend Cutover (Low-Volume Systems)
Best for: Systems with low weekend activity
Timeline:
Friday 5 PM: Freeze old system (read-only)
Friday 6 PM: Begin final data migration
Friday 11 PM: Validate migrated data
Saturday 8 AM: Test new system thoroughly
Sunday 2 PM: Training for support staff
Monday 7 AM: New system goes live
Rollback trigger: If critical issues found by Sunday noon, abort and revert
Example: County permit system - Friday evening: 2,347 permits in system - Weekend: Migrate all data, test thoroughly - Monday: Staff arrives to new system - Had entire weekend for migration + testing
Strategy 2: Phased Cutover (High-Volume Systems)
Best for: Systems that can't go offline
Phase 1: New transactions only
Week 1: All new data goes to new system
Old data stays in old system
Users must check both systems
Phase 2: Most recent data migrated
Week 2-3: Migrate last 6 months of data
Users check new system first, old system if not found
Phase 3: Historical data migrated
Week 4-8: Migrate older historical data in chunks
Most users fully on new system
Phase 4: Archive old system
Week 9+: Old system read-only for archive
New system fully operational
Example: Hospital patient records Can't migrate 20 years of records overnight. Instead: - Week 1: All new patients in new system - Week 2-4: Migrate last 2 years (active patients) - Month 2-3: Migrate years 3-10 (recent patients) - Month 4-6: Migrate years 11-20 (archive)
Strategy 3: Pilot Group Cutover
Best for: Large organizations, multiple locations
Week 1-2: Pilot group (10% of users)
- IT department tests new system
- Iron out issues with small group
- Gather feedback, fix problems
Week 3-4: Early adopters (20% of users)
- Volunteer departments go live
- Create internal champions
- Document lessons learned
Week 5-8: Major rollout (50% of users)
- Roll out to half of organization
- Support team at full capacity
- Monitor closely for issues
Week 9-12: Full deployment (100% of users)
- Remaining users cutover
- Old system shut down
- Migration complete
Example: State government (50 agencies) - Agency 1 (IT): Weeks 1-2 - Agency 2-3 (volunteers): Weeks 3-4 - Agencies 4-10: Week 5-6 - Agencies 11-25: Week 7-8 - Agencies 26-50: Week 9-10
Section 6: Rollback Planning
"Hope for the best, plan for the worst."
When to Roll Back
Trigger conditions (decide upfront): 1. Data loss: Any data lost or corrupted (automatic rollback) 2. Critical function down: Core feature doesn't work (rollback if not fixed in 4 hours) 3. Mass user rejection: >50% of users can't complete tasks (rollback if not fixed in 24 hours) 4. Performance disaster: System 10x slower than old system (rollback if not fixed in 8 hours) 5. Integration failure: Critical integration broken (e.g., payment processing) (immediate rollback)
Rollback Procedure
Prepare before cutover:
1. Backup everything
# Database backup
pg_dump production_db > backup_pre_cutover_2025_12_28.sql
# Application code backup
git tag v2.0-pre-cutover
git push origin v2.0-pre-cutover
# Configuration backup
cp -r /etc/app-config /backup/config_2025_12_28
2. Document rollback steps
# Rollback Procedure
## Decision
Rollback authorized by: [CTO name]
Reason: [Critical issue description]
Time: [timestamp]
## Steps (60-minute timeline)
### Minute 0-5: Notify
- [ ] Email all users: "System issue detected, rolling back"
- [ ] Post status page update
- [ ] Alert support team
### Minute 5-15: Stop new system
- [ ] Set new system to maintenance mode
- [ ] Stop application servers
- [ ] Wait for in-flight requests to complete
### Minute 15-30: Restore old system
- [ ] Restore database from backup
- [ ] Restart old application servers
- [ ] Verify old system functionality
### Minute 30-45: Data reconciliation
- [ ] Export any data created in new system (last 4 hours)
- [ ] Manually enter into old system (or save for later)
- [ ] Verify critical transactions not lost
### Minute 45-60: Verify and communicate
- [ ] Test old system with 10 test cases
- [ ] Confirm all users can access
- [ ] Email users: "System restored, back to normal"
- [ ] Update status page: "Resolved"
## Post-Rollback
- [ ] Schedule post-mortem meeting (within 24 hours)
- [ ] Analyze what went wrong
- [ ] Fix issues before reattempting migration
3. Test rollback procedure
Before go-live:
1. Perform full cutover in test environment
2. Introduce simulated issue
3. Execute rollback procedure
4. Verify everything restored correctly
5. Time how long each step takes
Goal: Rollback should take <60 minutes
Real example: E-commerce site rollback
Black Friday launch of new checkout system. Within 2 hours: - 40% of transactions failing (payment integration bug) - $80,000 in lost sales - Social media on fire
Rollback executed: - 15 minutes: Decided to roll back - 10 minutes: Notified users, stopped new system - 20 minutes: Restored old system from backup - 15 minutes: Verified functionality - Total: 60 minutes from decision to resolution
Lost sales during rollback: $15,000 Lost sales if continued with broken system: $300,000+
Critical advice: Test your rollback procedure. Many teams plan rollbacks that don't work when needed.
Section 7: Change Management
Technical migration is only half the battle. Users must embrace the new system.
The Psychology of Change
Users fear: - Looking incompetent (don't know new system) - Losing productivity (slower initially) - Losing job (automation replacing them) - Breaking things (scared to try features)
Your job: Address these fears proactively.
Change Management Strategy
Phase 1: Before Migration - Build Buy-In
Involve users early
Months before go-live:
- Survey users about pain points with old system
- Demo new system, get feedback
- Form "power user" group to test beta
- Let users suggest features
Result: Users feel heard, have ownership
Create champions
Identify enthusiastic users in each department:
- Give them early access
- Train them deeply
- Make them trainers for their peers
- Recognize them publicly
Result: Peer influence drives adoption
Communicate benefits (not features)
Bad: "New system has inline validation!"
Good: "No more rejected forms - system tells you errors immediately"
Bad: "Real-time integration with USPS API!"
Good: "Address autocomplete - just type and select, no more typos"
Focus on: Faster, easier, less frustration
Phase 2: During Migration - Support Users
Training strategy
Don't: Single 2-hour training session, done
Do: Multiple touchpoints
Week before:
- 1-hour overview session
- Hands-on practice in test system
Launch day:
- Quick reference card on every desk
- Support staff in every department
- "Help" button in every screen
Week after:
- Daily "office hours" for questions
- Video tutorials for common tasks
- Peer training (champions help colleagues)
Month after:
- Advanced features training
- Efficiency tips
- Gather feedback for improvements
Support structure
Level 1: Peer support (champions in each dept)
Level 2: Help desk (ticketing system)
Level 3: Development team (critical issues)
Track:
- Common questions (add to FAQ)
- Feature requests (prioritize for next release)
- Pain points (fix quickly)
Phase 3: After Migration - Sustain Adoption
Monitor usage
// Track feature adoption
analytics.track('feature_used', {
feature: 'inline_validation',
user: userId,
department: userDept
});
// Identify struggling users
const lowUsageUsers = getUsersWithLogin({
lastLoginMoreThan: '7 days ago',
totalLogins: '<5'
});
// Proactive outreach
for (const user of lowUsageUsers) {
sendEmail(user, {
template: 'need_help',
message: 'We noticed you haven't used the new system much. Need help?'
});
}
Celebrate wins
Share success stories:
- "Processing time cut from 10 days to 2 days!"
- "User satisfaction up 45%"
- "Zero errors this week (vs 20/week in old system)"
Recognize users:
- "Department of the Month: IT (100% adoption)"
- "Power User Spotlight: Jane (helped 30 colleagues)"
Continuous improvement
Monthly:
- Review support tickets (common issues)
- Survey users (satisfaction, pain points)
- Prioritize improvements
Quarterly:
- Release new features
- Advanced training sessions
- Showcase power user tips
Real Example: Hospital EMR Migration
Challenge: Doctors hated old paper charts but feared EMR.
Change management approach:
3 months before: - Interviewed 50 doctors about pain points - Demoed EMR focusing on their specific needs - Created "Physician Advisory Group" (10 doctors)
1 month before: - Advisory group tested EMR, gave feedback - Modified based on their input (custom templates) - Recorded video testimonials from advisory group
Launch week: - 1 doctor + 1 medical assistant per shift as "super users" - IT staff embedded in every clinic - Old paper charts still available (safety net)
Results: - Week 1: 60% adoption (better than expected 40%) - Week 4: 90% adoption - Month 3: 98% adoption - 6 months: Paper charts rarely used
Key factors: - Doctors involved in design (felt ownership) - Peer champions (trusted voices) - Safety net (could fall back to paper if needed) - Fast support (IT staff present)
Section 8: Migration Checklist
Pre-Migration (2-4 months before)
Data Assessment: - [ ] Extract sample data from legacy system - [ ] Profile data quality (null, duplicates, invalid) - [ ] Estimate cleanup effort (weeks? months?) - [ ] Create data quality report for stakeholders
Technical Preparation: - [ ] Set up new system infrastructure - [ ] Configure integrations with external systems - [ ] Build data migration pipeline (extract, transform, validate, load) - [ ] Test migration with sample data
Change Management: - [ ] Survey users about pain points - [ ] Demo new system to users - [ ] Form power user group - [ ] Begin training material development
Risk Planning: - [ ] Define rollback triggers - [ ] Document rollback procedure - [ ] Test rollback in staging environment - [ ] Identify critical go/no-go criteria
Migration Execution (1-3 months)
Data Migration: - [ ] Full data extract from legacy system - [ ] Data transformation and cleanup - [ ] Validation (counts, samples, business rules) - [ ] Load into new system - [ ] Post-load validation
Parallel Running: - [ ] Shadow mode (new system processes but doesn't affect production) - [ ] Compare outputs between old and new - [ ] Fix discrepancies - [ ] Read-only access for users (build confidence)
Training: - [ ] Train support staff first - [ ] Train power users / champions - [ ] Train general users (multiple sessions) - [ ] Distribute quick reference materials
Cutover: - [ ] Execute cutover plan (weekend/phased/pilot) - [ ] Monitor closely for first 48 hours - [ ] Support team at full capacity - [ ] Daily status updates to stakeholders
Post-Migration (1-3 months)
Support: - [ ] Office hours for user questions - [ ] Track and resolve support tickets - [ ] Identify and fix common issues - [ ] Gather user feedback
Optimization: - [ ] Monitor system performance - [ ] Optimize slow queries - [ ] Tune infrastructure as needed - [ ] Release bug fixes and improvements
Validation: - [ ] Verify data integrity - [ ] Reconcile financial records - [ ] Audit critical business processes - [ ] Sign-off from stakeholders
Decommission: - [ ] Keep old system read-only (6-12 months) - [ ] Archive old system data - [ ] Document lessons learned - [ ] Celebrate success! 🎉
Conclusion: Migration Success Formula
The proven formula for successful migrations:
- Start with data quality (Most failures are data issues)
- Never big bang (Strangler fig pattern always)
- Parallel running (Catches problems before cutover)
- Plan rollback (And test it!)
- Support users (Change management = 50% of effort)
- Monitor closely (First 30 days are critical)
- Iterate quickly (Fix issues fast, users will forgive bugs if you respond quickly)
The migrations that succeed: - ✅ Start with realistic assessment of data quality - ✅ Migrate incrementally (feature by feature) - ✅ Run old and new systems in parallel - ✅ Have tested rollback procedure - ✅ Train users thoroughly - ✅ Support heavily during first month - ✅ Fix issues within hours, not days
The migrations that fail: - ❌ Assume data is clean (it never is) - ❌ Try to cut over everything at once - ❌ No parallel running or testing - ❌ No rollback plan - ❌ Train users in single session before go-live - ❌ Skeleton support team - ❌ Slow to fix issues
Final wisdom from decades of migration experience:
"Every successful migration is boring. Gradual, methodical, well-planned. Every failed migration tried to be exciting - big reveals, dramatic cutovers. Boring is good. Boring means low risk. Boring means you go home on time. Aim for boring."
Next chapter: Business Models (how to make money from pattern-based systems).
Further Reading
Legacy System Modernization
Strategies: - Seacord, R. C., Plakosh, D., & Lewis, G. A. (2003). Modernizing Legacy Systems. Addison-Wesley. - Software evolution and migration patterns - Bisbal, J., et al. (1999). "A survey of research into legacy system migration." Software Maintenance and Evolution, 11(6), 335-364. - Academic survey of migration approaches - https://doi.org/10.1016/S0164-1212(99)00062-X
Strangler Fig Pattern: - Fowler, M. (2004). "Strangler Fig Application." https://martinfowler.com/bliki/StranglerFigApplication.html - Gradually replacing legacy systems - Newman, S. (2019). Monolith to Microservices. O'Reilly Media. - Evolutionary architecture and incremental migration
Data Migration
Core Texts: - Morris, J. (2012). Practical Data Migration (2nd ed.). BCS. - Comprehensive guide to data migration projects - Haller, K. (2009). "Six Strategies for Application and Data Migration Projects." IEEE Software, 26(1), 90-93. - https://doi.org/10.1109/MS.2009.13
ETL and Data Integration: - Kimball, R., & Caserta, J. (2004). The Data Warehouse ETL Toolkit. Wiley. - Extract, Transform, Load patterns - Informatica: https://www.informatica.com/ - Enterprise data integration platform - Talend: https://www.talend.com/ - Open-source data integration
Change Management
Organizational Change: - Kotter, J. P. (1996). Leading Change. Harvard Business Press. - 8-step process for successful change - Prosci ADKAR Model: https://www.prosci.com/methodology/adkar - Awareness, Desire, Knowledge, Ability, Reinforcement
Technology Adoption: - Rogers, E. M. (2003). Diffusion of Innovations (5th ed.). Free Press. - How new technologies spread through organizations - Gartner Hype Cycle: https://www.gartner.com/en/research/methodologies/gartner-hype-cycle - Technology maturity and adoption stages
Risk Management
Project Risk: - Boehm, B. W. (1991). "Software risk management: principles and practices." IEEE Software, 8(1), 32-41. - Risk-driven development - https://doi.org/10.1109/52.62930 - PMI: Practice Standard for Project Risk Management. https://www.pmi.org/
Testing Strategies: - Crispin, L., & Gregory, J. (2009). Agile Testing. Addison-Wesley. - Testing during migration projects - Humble, J., & Farley, D. (2010). Continuous Delivery. Addison-Wesley. - Automated testing and deployment pipelines
Parallel Running
Approaches: - Blue-Green Deployment: https://martinfowler.com/bliki/BlueGreenDeployment.html - Running old and new systems side-by-side - Canary Releases: https://martinfowler.com/bliki/CanaryRelease.html - Gradual rollout to subsets of users - Feature Flags: https://launchdarkly.com/blog/what-are-feature-flags/ - Toggling between old and new functionality
Tools: - LaunchDarkly: https://launchdarkly.com/ - Feature flag management - Split.io: https://www.split.io/ - Feature delivery platform - Optimizely: https://www.optimizely.com/ - Experimentation and feature flags
Training and Documentation
User Training: - Clark, R. C., & Mayer, R. E. (2016). E-Learning and the Science of Instruction (4th ed.). Wiley. - Evidence-based instructional design - WalkMe: https://www.walkme.com/ - Digital adoption platform - Pendo: https://www.pendo.io/ - In-app guidance and training
Documentation: - Write the Docs: https://www.writethedocs.org/ - Community for documentation best practices - Docs as Code: https://www.writethedocs.org/guide/docs-as-code/ - Treating documentation like code
Post-Migration Optimization
Performance: - Gregg, B. (2020). Systems Performance (2nd ed.). Addison-Wesley. - Performance analysis after migration - New Relic: https://newrelic.com/ - Application performance monitoring
Continuous Improvement: - Kim, G., et al. (2016). The DevOps Handbook. IT Revolution Press. - Continuous improvement culture - Accelerate (DORA Metrics): https://www.devops-research.com/research.html - Measuring software delivery performance
Related Trilogy Content
- Volume 1, Chapter 8: Architecture of Domain-Specific Systems—understanding system architecture principles
- Volume 1, Chapter 10: Domain Knowledge Acquisition—capturing knowledge from legacy systems
- Volume 2, Chapter 5: The Pattern Language Approach—systematic migration using patterns
- Volume 2, Chapter 2: From Static Output to Living Memory—transforming data architecture
- Volume 3, Chapter 12: Implementation Roadmap—planning the migration journey
- Volume 3, Chapter 13: Technology Architecture—designing your target architecture
- Volume 3, Pattern 19: Version Control—managing changes during migration
- Volume 3, Pattern 18: Audit Trail—tracking migration history