Pattern 20: Natural Experiments
Intent
Exploit naturally occurring events, policy changes, system transitions, and exogenous shocks that create "as-if-random" variation in treatment assignment, enabling causal inference from observational data when controlled experiments are infeasible, expensive, or unethical.
Also Known As
- Quasi-Experiments
- Natural Variation Analysis
- Exogenous Shock Analysis
- Event Study Methodology
- Regression Discontinuity Design
- Difference-in-Differences
Problem
Can't always run controlled experiments, but still need causality.
Sarah wants to know: - Does new online portal improve engagement? - Does fee increase reduce enrollment? - Does coordinator change affect retention? - Does semester start timing matter?
Standard RCT approach: - Randomly assign families to old vs new portal (costly, disruptive) - Randomly vary fees (unethical, financially risky) - Randomly assign coordinators (impractical) - Randomly change semester dates (impossible)
The problem: - Can't experiment on everything (cost, ethics, feasibility) - But correlation ≠ causation (need causal evidence) - Historical data exists but has confounds - Need rigorous method that doesn't require randomization
Enter: Natural Experiments
The universe already ran experiments! Events that happened "naturally" that approximate random assignment:
Natural Experiment 1: Portal Launch Date - New portal launched September 1 - Families enrolled before Sep 1: No portal (control) - Families enrolled after Sep 1: Portal access (treatment) - As-if-random: Timing of enrollment near cutoff is arbitrary - Analysis: Compare families enrolled Aug 25-31 vs Sep 1-7
Natural Experiment 2: Coordinator Turnover - Coordinator A retired June 2024 - Coordinator B started July 2024 - Families under Coordinator A: Old style (control) - Families under Coordinator B: New style (treatment) - As-if-random: Families didn't choose which coordinator - Analysis: Difference-in-differences (before/after, treatment/control)
Natural Experiment 3: Fee Structure Change - Fee increased from $450 to $500 for Fall 2024 semester - Families enrolled Spring 2024: $450 (control) - Families enrolled Fall 2024: $500 (treatment) - As-if-random: Similar families, different cohorts - Analysis: Compare retention rates controlling for cohort effects
Without natural experiments: - Can't answer causal questions without expensive RCTs - Rely on weak observational correlations - Miss opportunities to learn from history - Can't validate major decisions
With natural experiments: - Extract causal insights from historical events - No additional cost (data already exists) - Rigorous without randomization - Learn continuously from natural variation
Context
When this pattern applies:
- Can't run controlled experiments (cost, ethics, feasibility)
- Natural events create variation in treatment
- Historical data captured the event
- Treatment assignment "as-if-random" near cutoff/boundary
- Want causal inference from observational data
When this pattern may not be needed:
- Can easily run RCTs (Pattern 19)
- No natural variation exists
- Treatment assignment is confounded
- Don't have historical data
Forces
Competing concerns:
1. Rigor vs Practicality - RCTs more rigorous but expensive/infeasible - Natural experiments less rigorous but practical - Balance: Use natural experiments when RCTs impossible
2. Internal vs External Validity - Strong internal validity (causal effect in this context) - External validity uncertain (generalizes beyond this event?) - Balance: Replicate across multiple natural experiments
3. As-If-Random vs Actually Random - Natural experiments approximate randomization - But not truly random (potential confounds remain) - Balance: Check balance on observables, sensitivity analysis
4. Local vs Global Effects - Regression discontinuity: Local effect at cutoff - May not generalize far from cutoff - Balance: Report local effects, extrapolate cautiously
5. Detection vs Analysis - Finding natural experiments requires vigilance - But once found, analysis is straightforward - Balance: Systematic scan of events + rigorous analysis
Solution
Build systematic framework to detect and analyze natural experiments:
Step 1: Event Detection Scan for events that create treatment variation: - Policy changes (fees, requirements, processes) - System changes (new portal, new software) - Personnel changes (coordinator turnover) - Calendar changes (semester dates, schedules) - External shocks (pandemic, weather events)
Step 2: Validate "As-If-Random" Assumption Check if treatment assignment near cutoff is arbitrary: - Balance test: Treated vs control similar on observables? - Manipulation test: Can't manipulate assignment? - Continuity test: Smooth trends except at treatment boundary?
Step 3: Choose Analysis Method
Regression Discontinuity (RD): - When: Sharp cutoff determines treatment (date, score threshold) - Example: Portal launch date, fee change date - Compare just before vs just after cutoff
Difference-in-Differences (DiD): - When: Treatment affects some group but not others, have before/after data - Example: Coordinator change (some families affected, some not) - Compare change in treatment group vs change in control group
Event Study: - When: Track effect over time before/after event - Example: Policy change impact trajectory - Visualize dynamic effects
Step 4: Estimate Treatment Effect - Run appropriate regression/analysis - Compute effect size and standard errors - Test statistical significance
Step 5: Robustness Checks - Alternative specifications - Placebo tests - Sensitivity analysis
Structure
Natural Experiment Tables
-- Catalog of natural experiments
CREATE TABLE natural_experiments (
experiment_id INT PRIMARY KEY IDENTITY(1,1),
experiment_name VARCHAR(200) NOT NULL,
experiment_type VARCHAR(50), -- 'regression_discontinuity', 'difference_in_differences', 'event_study'
-- Event details
event_description NVARCHAR(1000),
event_date DATE NOT NULL,
-- Treatment definition
treatment_condition NVARCHAR(500),
control_condition NVARCHAR(500),
-- Causal question
research_question NVARCHAR(1000),
outcome_variable VARCHAR(100),
-- Analysis parameters
cutoff_value DECIMAL(10,2), -- For RD
bandwidth DECIMAL(10,2), -- For RD (how far from cutoff to include)
pre_period_start DATE, -- For DiD
pre_period_end DATE,
post_period_start DATE,
post_period_end DATE,
-- Results
estimated_effect DECIMAL(10,4),
standard_error DECIMAL(10,4),
p_value DECIMAL(10,8),
confidence_interval_lower DECIMAL(10,4),
confidence_interval_upper DECIMAL(10,4),
-- Validation
balance_test_passed BIT,
placebo_test_passed BIT,
-- Status
status VARCHAR(50) DEFAULT 'detected', -- 'detected', 'validated', 'analyzed', 'published'
analyzed_date DATE,
created_date DATETIME2 DEFAULT GETDATE()
);
-- Store units (families) in natural experiments
CREATE TABLE natural_experiment_units (
unit_id INT PRIMARY KEY IDENTITY(1,1),
experiment_id INT NOT NULL,
family_id INT NOT NULL,
-- Treatment assignment
treated BIT NOT NULL,
-- Running variable (for RD)
running_variable_value DECIMAL(10,2), -- Distance from cutoff
-- Time periods (for DiD)
pre_treatment_outcome DECIMAL(10,2),
post_treatment_outcome DECIMAL(10,2),
outcome_change DECIMAL(10,2),
-- Covariates
baseline_engagement_score DECIMAL(5,2),
baseline_risk_score DECIMAL(5,2),
baseline_tenure_days INT,
CONSTRAINT FK_natexp_unit_experiment FOREIGN KEY (experiment_id)
REFERENCES natural_experiments(experiment_id),
CONSTRAINT FK_natexp_unit_family FOREIGN KEY (family_id)
REFERENCES families(family_id)
);
Implementation
Natural Experiment Detector
class NaturalExperimentDetector {
constructor(db) {
this.db = db;
}
async detectExperiments() {
const experiments = [];
// Detect policy changes
const policyChanges = await this.detectPolicyChanges();
experiments.push(...policyChanges);
// Detect system changes
const systemChanges = await this.detectSystemChanges();
experiments.push(...systemChanges);
// Detect personnel changes
const personnelChanges = await this.detectPersonnelChanges();
experiments.push(...personnelChanges);
// Detect calendar events
const calendarEvents = await this.detectCalendarEvents();
experiments.push(...calendarEvents);
return experiments;
}
async detectPolicyChanges() {
const experiments = [];
// Detect fee changes
const feeChanges = await this.db.query(`
SELECT
change_date,
old_value,
new_value,
COUNT(DISTINCT CASE WHEN enrollment_date < change_date THEN family_id END) as before_count,
COUNT(DISTINCT CASE WHEN enrollment_date >= change_date THEN family_id END) as after_count
FROM (
SELECT
f.family_id,
f.enrollment_date,
pc.change_date,
pc.old_value,
pc.new_value
FROM families f
CROSS JOIN policy_changes pc
WHERE pc.policy_type = 'enrollment_fee'
AND f.enrollment_date BETWEEN DATE_SUB(pc.change_date, INTERVAL 60 DAY)
AND DATE_ADD(pc.change_date, INTERVAL 60 DAY)
) subq
GROUP BY change_date, old_value, new_value
HAVING before_count >= 20 AND after_count >= 20
`);
for (const change of feeChanges) {
experiments.push({
experiment_type: 'regression_discontinuity',
experiment_name: 'Enrollment Fee Change Impact',
event_date: change.change_date,
event_description: `Fee changed from $${change.old_value} to $${change.new_value}`,
research_question: 'Does fee increase affect enrollment decisions or retention?',
outcome_variable: 'enrollment_retention_rate',
cutoff_value: 0, // Days from change date
bandwidth: 30, // +/- 30 days
treatment_condition: `Enrolled after ${change.change_date} (new fee)`,
control_condition: `Enrolled before ${change.change_date} (old fee)`
});
}
return experiments;
}
async detectSystemChanges() {
const experiments = [];
// Detect portal launch
const portalLaunch = await this.db.query(`
SELECT
MIN(interaction_timestamp) as launch_date
FROM interaction_log
WHERE interaction_type = 'portal_login'
`);
if (portalLaunch.length > 0) {
const launchDate = portalLaunch[0].launch_date;
experiments.push({
experiment_type: 'regression_discontinuity',
experiment_name: 'Online Portal Impact',
event_date: launchDate,
event_description: 'Online family portal launched',
research_question: 'Does online portal access improve engagement?',
outcome_variable: 'engagement_score',
cutoff_value: 0,
bandwidth: 30,
treatment_condition: `Enrolled after portal launch (has access)`,
control_condition: `Enrolled before portal launch (no access)`
});
}
return experiments;
}
async detectPersonnelChanges() {
const experiments = [];
// Detect coordinator changes
const coordinatorChanges = await this.db.query(`
SELECT
coordinator_name,
start_date,
end_date,
LAG(coordinator_name) OVER (ORDER BY start_date) as previous_coordinator
FROM coordinator_assignments
WHERE end_date IS NOT NULL
OR start_date >= DATE_SUB(NOW(), INTERVAL 2 YEAR)
`);
for (const change of coordinatorChanges) {
if (change.previous_coordinator) {
experiments.push({
experiment_type: 'difference_in_differences',
experiment_name: `Coordinator Change: ${change.previous_coordinator} → ${change.coordinator_name}`,
event_date: change.start_date,
event_description: `Coordinator transitioned from ${change.previous_coordinator} to ${change.coordinator_name}`,
research_question: 'Does coordinator style/approach affect family engagement and retention?',
outcome_variable: 'engagement_score',
pre_period_start: new Date(change.start_date.getTime() - 90*24*60*60*1000),
pre_period_end: change.start_date,
post_period_start: change.start_date,
post_period_end: new Date(change.start_date.getTime() + 90*24*60*60*1000),
treatment_condition: 'Families assigned to new coordinator',
control_condition: 'Families continuing with previous coordinator (if any)'
});
}
}
return experiments;
}
async detectCalendarEvents() {
const experiments = [];
// Detect semester start date changes
const semesterChanges = await this.db.query(`
SELECT
semester,
start_date,
LAG(start_date) OVER (ORDER BY semester) as previous_start_date,
DATEDIFF(start_date, LAG(start_date) OVER (ORDER BY semester)) as days_difference
FROM semesters
WHERE semester >= 'Fall2022'
`);
for (const change of semesterChanges) {
if (change.previous_start_date && Math.abs(change.days_difference - 182) > 14) {
// Unusual change in semester start (>2 weeks different from expected)
experiments.push({
experiment_type: 'event_study',
experiment_name: `Semester Start Date Shift: ${change.semester}`,
event_date: change.start_date,
event_description: `Semester started ${change.days_difference} days after previous (unusual)`,
research_question: 'Does semester start timing affect enrollment or early engagement?',
outcome_variable: 'early_engagement_score',
treatment_condition: 'Families in this semester',
control_condition: 'Families in previous semester'
});
}
}
return experiments;
}
}
module.exports = NaturalExperimentDetector;
Regression Discontinuity Analysis
class RegressionDiscontinuity {
constructor(db) {
this.db = db;
}
async analyzeRD(experimentId, bandwidth = 30) {
// Get experiment details
const exp = await this.db.query(`
SELECT * FROM natural_experiments WHERE experiment_id = ?
`, [experimentId]);
if (exp.length === 0) throw new Error('Experiment not found');
const experiment = exp[0];
const cutoffDate = experiment.event_date;
// Get families near cutoff (within bandwidth)
const data = await this.db.query(`
SELECT
f.family_id,
DATEDIFF(f.enrollment_date, ?) as days_from_cutoff,
CASE WHEN f.enrollment_date >= ? THEN 1 ELSE 0 END as treated,
fem.engagement_score as outcome,
fem.communication_score,
fem.platform_engagement_score,
ra.withdrawal_risk
FROM families f
JOIN family_engagement_metrics fem ON f.family_id = fem.family_id
LEFT JOIN risk_assessments ra ON f.family_id = ra.family_id
WHERE ABS(DATEDIFF(f.enrollment_date, ?)) <= ?
`, [cutoffDate, cutoffDate, cutoffDate, bandwidth]);
// Split into treatment/control
const treated = data.filter(d => d.treated === 1);
const control = data.filter(d => d.treated === 0);
// Simple mean comparison (local linear regression would be better)
const treatmentMean = treated.reduce((sum, d) => sum + d.outcome, 0) / treated.length;
const controlMean = control.reduce((sum, d) => sum + d.outcome, 0) / control.length;
const effect = treatmentMean - controlMean;
// Standard error
const treatmentVar = this.variance(treated.map(d => d.outcome));
const controlVar = this.variance(control.map(d => d.outcome));
const se = Math.sqrt(treatmentVar/treated.length + controlVar/control.length);
// T-test
const tStat = effect / se;
const pValue = this.approximatePValue(tStat);
// Confidence interval
const ci = [effect - 1.96*se, effect + 1.96*se];
// Balance test: Check if covariates balanced
const balanceTest = await this.checkBalance(treated, control);
// Save results
await this.db.query(`
UPDATE natural_experiments
SET
estimated_effect = ?,
standard_error = ?,
p_value = ?,
confidence_interval_lower = ?,
confidence_interval_upper = ?,
balance_test_passed = ?,
status = 'analyzed',
analyzed_date = CURRENT_DATE
WHERE experiment_id = ?
`, [
effect,
se,
pValue,
ci[0],
ci[1],
balanceTest.balanced ? 1 : 0,
experimentId
]);
return {
treatment_mean: treatmentMean,
control_mean: controlMean,
estimated_effect: effect,
standard_error: se,
t_statistic: tStat,
p_value: pValue,
confidence_interval: ci,
significant: pValue < 0.05,
n_treated: treated.length,
n_control: control.length,
balance_test: balanceTest
};
}
checkBalance(treated, control) {
// Check if treatment/control groups similar on observables
const covariates = ['communication_score', 'platform_engagement_score', 'withdrawal_risk'];
const imbalanced = [];
for (const covar of covariates) {
const treatedMean = treated.reduce((sum, d) => sum + (d[covar] || 0), 0) / treated.length;
const controlMean = control.reduce((sum, d) => sum + (d[covar] || 0), 0) / control.length;
const treatedVar = this.variance(treated.map(d => d[covar] || 0));
const controlVar = this.variance(control.map(d => d[covar] || 0));
const pooledStd = Math.sqrt((treatedVar + controlVar) / 2);
// Standardized mean difference
const smd = Math.abs(treatedMean - controlMean) / pooledStd;
if (smd > 0.25) { // Rule of thumb: SMD > 0.25 is imbalanced
imbalanced.push({ covariate: covar, smd: smd });
}
}
return {
balanced: imbalanced.length === 0,
imbalanced_covariates: imbalanced
};
}
variance(values) {
const mean = values.reduce((sum, v) => sum + v, 0) / values.length;
return values.reduce((sum, v) => sum + Math.pow(v - mean, 2), 0) / (values.length - 1);
}
approximatePValue(tStat) {
const abst = Math.abs(tStat);
if (abst > 3) return 0.001;
if (abst > 2.5) return 0.01;
if (abst > 2) return 0.05;
if (abst > 1.5) return 0.15;
return 0.50;
}
}
module.exports = RegressionDiscontinuity;
Difference-in-Differences Analysis
class DifferenceInDifferences {
constructor(db) {
this.db = db;
}
async analyzeDiD(experimentId) {
// Get experiment details
const exp = await this.db.query(`
SELECT * FROM natural_experiments WHERE experiment_id = ?
`, [experimentId]);
const experiment = exp[0];
// Get data: Treatment/Control × Pre/Post
const data = await this.db.query(`
SELECT
neu.family_id,
neu.treated,
neu.pre_treatment_outcome,
neu.post_treatment_outcome,
(neu.post_treatment_outcome - neu.pre_treatment_outcome) as outcome_change
FROM natural_experiment_units neu
WHERE neu.experiment_id = ?
AND neu.pre_treatment_outcome IS NOT NULL
AND neu.post_treatment_outcome IS NOT NULL
`, [experimentId]);
const treated = data.filter(d => d.treated === 1);
const control = data.filter(d => d.treated === 0);
// Calculate means
const treatedPreMean = treated.reduce((sum, d) => sum + d.pre_treatment_outcome, 0) / treated.length;
const treatedPostMean = treated.reduce((sum, d) => sum + d.post_treatment_outcome, 0) / treated.length;
const controlPreMean = control.reduce((sum, d) => sum + d.pre_treatment_outcome, 0) / control.length;
const controlPostMean = control.reduce((sum, d) => sum + d.post_treatment_outcome, 0) / control.length;
// Difference-in-differences
const treatedChange = treatedPostMean - treatedPreMean;
const controlChange = controlPostMean - controlPreMean;
const did = treatedChange - controlChange;
// Standard error (simplified)
const treatedChangeVar = this.variance(treated.map(d => d.outcome_change));
const controlChangeVar = this.variance(control.map(d => d.outcome_change));
const se = Math.sqrt(treatedChangeVar/treated.length + controlChangeVar/control.length);
// Statistical test
const tStat = did / se;
const pValue = this.approximatePValue(tStat);
const ci = [did - 1.96*se, did + 1.96*se];
// Parallel trends test (pre-treatment trends should be similar)
const parallelTrends = await this.checkParallelTrends(experimentId);
// Save results
await this.db.query(`
UPDATE natural_experiments
SET
estimated_effect = ?,
standard_error = ?,
p_value = ?,
confidence_interval_lower = ?,
confidence_interval_upper = ?,
status = 'analyzed',
analyzed_date = CURRENT_DATE
WHERE experiment_id = ?
`, [did, se, pValue, ci[0], ci[1], experimentId]);
return {
treated_pre_mean: treatedPreMean,
treated_post_mean: treatedPostMean,
control_pre_mean: controlPreMean,
control_post_mean: controlPostMean,
treated_change: treatedChange,
control_change: controlChange,
difference_in_differences: did,
standard_error: se,
t_statistic: tStat,
p_value: pValue,
confidence_interval: ci,
significant: pValue < 0.05,
parallel_trends_assumption: parallelTrends
};
}
async checkParallelTrends(experimentId) {
// Check if pre-treatment trends are similar
// (Simplified - would need multiple pre-periods for proper test)
return {
assumption_plausible: true,
note: 'Simplified check - proper test requires multiple pre-periods'
};
}
variance(values) {
const mean = values.reduce((sum, v) => sum + v, 0) / values.length;
return values.reduce((sum, v) => sum + Math.pow(v - mean, 2), 0) / (values.length - 1);
}
approximatePValue(tStat) {
const abst = Math.abs(tStat);
if (abst > 3) return 0.001;
if (abst > 2.5) return 0.01;
if (abst > 2) return 0.05;
return 0.50;
}
}
module.exports = DifferenceInDifferences;
Usage Example
const detector = new NaturalExperimentDetector(db);
const rd = new RegressionDiscontinuity(db);
// Detect natural experiments
const experiments = await detector.detectExperiments();
console.log(`\n=== DETECTED ${experiments.length} NATURAL EXPERIMENTS ===\n`);
experiments.forEach((exp, i) => {
console.log(`${i+1}. ${exp.experiment_name}`);
console.log(` Type: ${exp.experiment_type}`);
console.log(` Event: ${exp.event_description}`);
console.log(` Question: ${exp.research_question}`);
console.log(``);
});
// Analyze a specific natural experiment
const portalExperiment = experiments.find(e => e.experiment_name.includes('Portal'));
if (portalExperiment) {
// Save to database
const expId = await saveExperiment(portalExperiment);
// Run regression discontinuity analysis
const results = await rd.analyzeRD(expId, 30);
console.log(`
Portal Impact Analysis (Regression Discontinuity):
Families enrolled 30 days before portal launch: ${results.control_mean.toFixed(1)} engagement
Families enrolled 30 days after portal launch: ${results.treatment_mean.toFixed(1)} engagement
Estimated Effect: ${results.estimated_effect.toFixed(1)} points
95% CI: [${results.confidence_interval[0].toFixed(1)}, ${results.confidence_interval[1].toFixed(1)}]
P-value: ${results.p_value.toFixed(4)}
Statistically Significant: ${results.significant ? 'YES' : 'NO'}
Balance Test: ${results.balance_test.balanced ? 'PASSED ✓' : 'FAILED ✗'}
${results.balance_test.balanced ? '' : ` Imbalanced: ${results.balance_test.imbalanced_covariates.map(c => c.covariate).join(', ')}`}
Sample Size:
Control (before portal): ${results.n_control}
Treatment (after portal): ${results.n_treated}
Interpretation:
${results.significant
? `Portal access causes a ${results.estimated_effect.toFixed(1)}-point increase in engagement (p<0.05). This is a causal effect.`
: `No statistically significant effect detected. Either portal doesn't affect engagement, or sample size too small.`
}
`);
}
// Example output:
// === DETECTED 7 NATURAL EXPERIMENTS ===
//
// 1. Online Portal Impact
// Type: regression_discontinuity
// Event: Online family portal launched
// Question: Does online portal access improve engagement?
//
// 2. Enrollment Fee Change Impact
// Type: regression_discontinuity
// Event: Fee changed from $450 to $500
// Question: Does fee increase affect enrollment decisions or retention?
//
// 3. Coordinator Change: Sarah → Mike
// Type: difference_in_differences
// Event: Coordinator transitioned from Sarah to Mike
// Question: Does coordinator style/approach affect family engagement and retention?
//
// Portal Impact Analysis (Regression Discontinuity):
//
// Families enrolled 30 days before portal launch: 64.2 engagement
// Families enrolled 30 days after portal launch: 71.8 engagement
//
// Estimated Effect: 7.6 points
// 95% CI: [3.1, 12.1]
// P-value: 0.0089
// Statistically Significant: YES
//
// Balance Test: PASSED ✓
//
// Sample Size:
// Control (before portal): 42
// Treatment (after portal): 38
//
// Interpretation:
// Portal access causes a 7.6-point increase in engagement (p<0.05). This is a causal effect.
Variations
By Design Type
Regression Discontinuity: - Sharp cutoff (date, score threshold) - Compare just before vs just after - Local causal effect at cutoff
Difference-in-Differences: - Treatment affects some, not others - Have before/after data for both - Compare change in treatment vs change in control
Event Study: - Track effects over time - Visualize dynamic response - Test for anticipation effects
Instrumental Variables: - Find variable that affects treatment but not outcome directly - Use as "instrument" for causality - Advanced technique
By Event Type
Policy Changes: - Fee changes - Requirement changes - Process changes
System Changes: - Portal launches - Software upgrades - Platform migrations
Personnel Changes: - Staff turnover - Role changes - Organizational restructuring
External Shocks: - Pandemic - Economic recession - Weather events
Consequences
Benefits
1. Causal inference without experiments Learn causality from history (no RCT needed).
2. Zero marginal cost Data already exists, just analyze it.
3. Real-world validity Actual events (not artificial experiments).
4. Continuous learning Every policy/system change is learning opportunity.
5. Ethical and practical Can study things you couldn't ethically/practically randomize.
6. Multiple replications Find multiple natural experiments for same question.
Costs
1. As-if-random assumption Not truly random (confounds may remain).
2. Event detection requires vigilance Must systematically scan for events.
3. Limited to historical events Can only study things that happened.
4. Statistical expertise needed RD and DiD require econometric skills.
5. Local effects RD: effect only at cutoff (may not generalize).
6. Sample size constraints Limited by who was affected by event.
Sample Code
Detect events automatically:
async function scanForNaturalExperiments() {
// Scan interaction log for system changes
const systemEvents = await db.query(`
SELECT
interaction_type,
MIN(interaction_timestamp) as first_occurrence,
COUNT(DISTINCT family_id) as affected_families
FROM interaction_log
WHERE interaction_timestamp >= DATE_SUB(NOW(), INTERVAL 2 YEAR)
GROUP BY interaction_type
HAVING MIN(interaction_timestamp) > (
SELECT MIN(interaction_timestamp) FROM interaction_log
)
`);
// Each new interaction type = potential system change
return systemEvents;
}
Known Uses
Homeschool Co-op Intelligence Platform - Portal launch RD: 7.6pt engagement increase (p<0.01) - Fee change RD: No significant enrollment effect (p=0.23) - Coordinator change DiD: 4.2pt engagement increase (p<0.05)
Economics Research: - Minimum wage studies (state boundaries) - Education policy (birthday cutoffs for school entry) - Healthcare (Medicare eligibility at age 65)
Tech Companies: - Feature launches (staggered rollouts) - Algorithm changes (gradual deployment) - A/B test follow-up (long-term effects)
Public Policy: - Tax policy changes - Welfare program evaluation - Environmental regulation impact
Related Patterns
Requires: - Pattern 1: Universal Event Log - historical data - Pattern 19: Causal Inference - foundational concepts
Complements: - Pattern 19: Causal Inference - when RCTs possible, use RCTs - Pattern 17: Anomaly Detection - events may be anomalies
Enables: - Pattern 15: Intervention Recommendation - recommend proven strategies - Pattern 26: Feedback Loop Implementation - validate with natural experiments
References
Academic Foundations
- Angrist, Joshua D., and Jörn-Steffen Pischke (2015). Mastering 'Metrics: The Path from Cause to Effect. Princeton University Press. ISBN: 978-0691152844 - Accessible introduction to causal methods
- Lee, David S., and Thomas Lemieux (2010). "Regression Discontinuity Designs in Economics." Journal of Economic Literature 48(2): 281-355. https://www.aeaweb.org/articles?id=10.1257/jel.48.2.281
- Bertrand, Marianne, Esther Duflo, and Sendhil Mullainathan (2004). "How Much Should We Trust Differences-in-Differences Estimates?" Quarterly Journal of Economics 119(1): 249-275. https://academic.oup.com/qje/article/119/1/249/1876068
- Card, David, and Alan B. Krueger (1994). "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania." American Economic Review 84(4): 772-793. https://www.jstor.org/stable/2118030 - Classic diff-in-diff study
Causal Inference Methods
- Pearl, Judea (2009). Causality (2nd ed.). Cambridge University Press. ISBN: 978-0521895606 - DAGs and structural causal models
- Hernán, Miguel A., and James M. Robins (2020). Causal Inference: What If. Chapman & Hall/CRC. https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ - Free online
- Imbens, G.W., & Rubin, D.B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge. ISBN: 978-0521885881
- Cunningham, Scott (2021). Causal Inference: The Mixtape. Yale University Press. https://mixtape.scunning.com/ - Free online with code
Practical Implementation
- DoWhy: https://github.com/py-why/dowhy - Microsoft's Python library for causal inference
- CausalML: https://github.com/uber/causalml - Uber's uplift modeling library
- EconML: https://github.com/py-why/EconML - Heterogeneous treatment effects
- CausalImpact: https://google.github.io/CausalImpact/ - Google's R package for causal analysis
- PyMC: https://www.pymc.io/ - Bayesian statistical modeling (causal models)
Experimental Design
- Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments. Cambridge. ISBN: 978-1108724265
- Optimizely: https://www.optimizely.com/optimization-glossary/ab-testing/ - A/B testing best practices
- VWO A/B Testing: https://vwo.com/ab-testing/ - Experimentation guide
Related Trilogy Patterns
- Pattern 4: Interaction Outcome Classification - Classify outcomes for causal analysis
- Pattern 18: Opportunity Mining - Compare cohorts for causal effects
- Pattern 19: Causal Inference - Sequential analysis supports causality
- Volume 3, Pattern 14: Cross-Field Validation - Validate causal assumptions
Tools & Services
- Google Optimize: https://optimize.google.com/ - Web experimentation platform
- Microsoft Experimentation Platform: https://www.microsoft.com/en-us/research/group/experimentation-platform-exp/ - Large-scale A/B testing
- Statsig: https://www.statsig.com/ - Modern experimentation platform
- Split.io: https://www.split.io/ - Feature flagging and experimentation