Pattern 17: Anomaly Detection
Intent
Identify unusual patterns, statistical outliers, and behavioral deviations from expected norms that signal potential problems requiring immediate attention or unexpected opportunities worth investigating, enabling proactive intervention before issues escalate.
Also Known As
- Outlier Detection
- Deviation Analysis
- Exception Monitoring
- Unusual Pattern Detection
- Statistical Process Control
Problem
Normal monitoring misses the weird stuff.
Sarah reviews standard reports: - Engagement scores - Risk assessments - Payment status
But misses critical anomalies:
Family 142: - Engagement score: 72 (looks fine) - But: Login pattern changed from 9am weekdays to 3am weekends - Anomaly signal: Behavioral shift (possible family crisis, job loss?) - Withdrew 2 weeks later
Family 89: - Payment history: Always on time - But: Last 3 emails bounced (email address invalid) - Anomaly signal: Communication breakdown - Can't reach family when needed
Family 203: - Engagement: Stable at 65 - But: Sudden spike to 95 last week (volunteered 12 hours, attended 3 events) - Anomaly signal: Positive anomaly! (Why the change? Can we replicate?) - Opportunity to understand and amplify
Without anomaly detection: - Focus on aggregate metrics (miss individual deviations) - React after problems manifest - Miss early warning signals - Don't catch positive anomalies (learning opportunities)
With anomaly detection: - Flag unusual patterns automatically - Investigate before escalation - Learn from positive deviations - Prioritize attention on outliers
Context
When this pattern applies:
- Population large enough for normal baselines (50+ entities)
- Behavior has expected patterns
- Deviations are meaningful (not just noise)
- Early detection provides value
- Can investigate flagged anomalies
When this pattern may not be needed:
- Very small population (<20)
- Highly variable behavior (no baseline)
- Anomalies aren't actionable
- Real-time detection not feasible
Forces
Competing concerns:
1. Sensitivity vs False Alarms - High sensitivity = catch everything (but many false alarms) - Low sensitivity = miss anomalies (but fewer false alarms) - Balance: Tune thresholds based on investigation capacity
2. Statistical vs Contextual - Statistical: Pure math (3 standard deviations) - Contextual: Domain knowledge (3am login is weird) - Balance: Combine both approaches
3. Point-in-Time vs Temporal - Point: Current value unusual (score = 15) - Temporal: Change unusual (score dropped 40 points in 1 week) - Balance: Detect both types
4. Individual vs Cohort-Relative - Individual: Deviation from own baseline - Cohort: Deviation from peer group - Balance: Flag both (different meanings)
5. Reactive vs Proactive - Reactive: Detect after event occurs - Proactive: Predict likely anomalies - Balance: Mostly reactive, some predictive
Solution
Implement multi-layered anomaly detection:
Layer 1: Statistical Anomalies - Z-score: >3 standard deviations from mean - IQR: Beyond 1.5× interquartile range - Percentile: Top/bottom 1% of distribution
Layer 2: Temporal Anomalies - Sudden changes: 2× normal velocity - Trend breaks: Direction reversal - Pattern disruption: Behavior shift
Layer 3: Behavioral Anomalies - Activity timing: Login at unusual hours - Sequence breaks: Skip normal steps - Volume spikes: 5× normal activity
Layer 4: Cohort-Relative Anomalies - Peer deviation: Different from cohort baseline - Cross-cohort: Behavior typical of wrong cohort
Layer 5: Multi-Dimensional Anomalies - Isolation Forest: Unusual feature combinations - Local Outlier Factor: Density-based outliers
Structure
Anomaly Detection Tables
-- Store detected anomalies
CREATE TABLE anomalies (
anomaly_id INT PRIMARY KEY IDENTITY(1,1),
family_id INT NOT NULL,
-- Detection
detected_date DATETIME2 DEFAULT GETDATE(),
anomaly_type VARCHAR(100), -- 'statistical', 'temporal', 'behavioral', 'cohort_relative', 'multi_dimensional'
anomaly_category VARCHAR(100), -- 'engagement_drop', 'payment_spike', 'timing_shift', 'communication_break'
-- Severity
severity VARCHAR(20), -- 'critical', 'high', 'medium', 'low'
anomaly_score DECIMAL(5,2), -- 0-100, higher = more anomalous
-- Details
description NVARCHAR(1000),
metric_name VARCHAR(100),
expected_value DECIMAL(10,2),
actual_value DECIMAL(10,2),
deviation_magnitude DECIMAL(10,2),
-- Context
baseline_period_days INT,
detection_method VARCHAR(100), -- 'z_score', 'iqr', 'isolation_forest', etc.
-- Investigation
status VARCHAR(50) DEFAULT 'new', -- 'new', 'investigating', 'resolved', 'false_positive'
investigated_by VARCHAR(100),
investigation_notes NVARCHAR(MAX),
resolution_date DATETIME2,
-- Outcome
required_intervention BIT,
intervention_taken NVARCHAR(500),
was_true_positive BIT,
CONSTRAINT FK_anomaly_family FOREIGN KEY (family_id)
REFERENCES families(family_id)
);
-- Index for active anomalies
CREATE INDEX IX_anomalies_active ON anomalies(status, severity, detected_date)
WHERE status IN ('new', 'investigating');
-- Store baseline statistics for comparison
CREATE TABLE baseline_statistics (
baseline_id INT PRIMARY KEY IDENTITY(1,1),
metric_name VARCHAR(100) NOT NULL,
cohort_id INT, -- NULL = global baseline
-- Statistics
mean_value DECIMAL(10,2),
std_dev DECIMAL(10,2),
median_value DECIMAL(10,2),
q1_value DECIMAL(10,2), -- 25th percentile
q3_value DECIMAL(10,2), -- 75th percentile
min_value DECIMAL(10,2),
max_value DECIMAL(10,2),
-- Time period
calculation_date DATE NOT NULL,
period_days INT DEFAULT 90,
sample_size INT,
CONSTRAINT UQ_baseline UNIQUE (metric_name, cohort_id, calculation_date)
);
Implementation
Anomaly Detection Engine
class AnomalyDetectionEngine {
constructor(db) {
this.db = db;
// Thresholds
this.zScoreThreshold = 3.0;
this.iqrMultiplier = 1.5;
this.velocityMultiplier = 2.0;
}
async detectAnomalies(familyId = null) {
const families = familyId
? [{ family_id: familyId }]
: await this.getAllActiveFamilies();
const allAnomalies = [];
for (const family of families) {
// Layer 1: Statistical anomalies
const statistical = await this.detectStatisticalAnomalies(family.family_id);
// Layer 2: Temporal anomalies
const temporal = await this.detectTemporalAnomalies(family.family_id);
// Layer 3: Behavioral anomalies
const behavioral = await this.detectBehavioralAnomalies(family.family_id);
// Layer 4: Cohort-relative anomalies
const cohortRelative = await this.detectCohortAnomalies(family.family_id);
// Combine and deduplicate
const familyAnomalies = [
...statistical,
...temporal,
...behavioral,
...cohortRelative
];
// Save to database
for (const anomaly of familyAnomalies) {
await this.saveAnomaly(family.family_id, anomaly);
}
allAnomalies.push(...familyAnomalies);
}
return allAnomalies;
}
async detectStatisticalAnomalies(familyId) {
const anomalies = [];
// Get current metrics
const current = await this.db.query(`
SELECT
engagement_score,
communication_score,
platform_engagement_score,
participation_score
FROM family_engagement_metrics
WHERE family_id = ?
`, [familyId]);
if (!current.length) return anomalies;
const metrics = current[0];
// Get baseline statistics
const baselines = await this.db.query(`
SELECT metric_name, mean_value, std_dev, q1_value, q3_value
FROM baseline_statistics
WHERE cohort_id IS NULL
AND calculation_date = (SELECT MAX(calculation_date) FROM baseline_statistics)
`);
// Check each metric
for (const [metricName, value] of Object.entries(metrics)) {
const baseline = baselines.find(b => b.metric_name === metricName);
if (!baseline) continue;
// Z-score anomaly
const zScore = Math.abs((value - baseline.mean_value) / baseline.std_dev);
if (zScore > this.zScoreThreshold) {
anomalies.push({
anomaly_type: 'statistical',
anomaly_category: `${metricName}_outlier`,
severity: this.categorizeSeverity(zScore, 3, 4, 5),
anomaly_score: Math.min(100, zScore * 20),
description: `${metricName} is ${zScore.toFixed(1)} standard deviations from mean`,
metric_name: metricName,
expected_value: baseline.mean_value,
actual_value: value,
deviation_magnitude: zScore,
detection_method: 'z_score'
});
}
// IQR anomaly
const iqr = baseline.q3_value - baseline.q1_value;
const lowerBound = baseline.q1_value - (this.iqrMultiplier * iqr);
const upperBound = baseline.q3_value + (this.iqrMultiplier * iqr);
if (value < lowerBound || value > upperBound) {
const deviation = value < lowerBound
? (lowerBound - value) / iqr
: (value - upperBound) / iqr;
anomalies.push({
anomaly_type: 'statistical',
anomaly_category: `${metricName}_iqr_outlier`,
severity: this.categorizeSeverity(deviation, 1, 2, 3),
anomaly_score: Math.min(100, deviation * 33),
description: `${metricName} is outside IQR bounds`,
metric_name: metricName,
expected_value: value < lowerBound ? lowerBound : upperBound,
actual_value: value,
deviation_magnitude: deviation,
detection_method: 'iqr'
});
}
}
return anomalies;
}
async detectTemporalAnomalies(familyId) {
const anomalies = [];
// Get metric history
const history = await this.db.query(`
SELECT
calculation_date,
engagement_score,
score_delta,
score_velocity
FROM family_engagement_metrics_history
WHERE family_id = ?
ORDER BY calculation_date DESC
LIMIT 12
`, [familyId]);
if (history.length < 3) return anomalies;
const recent = history[0];
const previous = history[1];
// 1. Sudden drop anomaly
if (recent.score_delta < -10) {
const severity = recent.score_delta < -20 ? 'critical' :
recent.score_delta < -15 ? 'high' : 'medium';
anomalies.push({
anomaly_type: 'temporal',
anomaly_category: 'engagement_sudden_drop',
severity: severity,
anomaly_score: Math.min(100, Math.abs(recent.score_delta) * 3),
description: `Engagement dropped ${Math.abs(recent.score_delta).toFixed(1)} points`,
metric_name: 'engagement_score',
expected_value: previous.engagement_score,
actual_value: recent.engagement_score,
deviation_magnitude: Math.abs(recent.score_delta),
detection_method: 'sudden_change'
});
}
// 2. Velocity change anomaly (acceleration/deceleration)
const avgVelocity = history.slice(1, 6)
.reduce((sum, h) => sum + (h.score_velocity || 0), 0) / 5;
const velocityChange = Math.abs(recent.score_velocity - avgVelocity);
if (velocityChange > (Math.abs(avgVelocity) * this.velocityMultiplier)) {
anomalies.push({
anomaly_type: 'temporal',
anomaly_category: 'velocity_change',
severity: velocityChange > 10 ? 'high' : 'medium',
anomaly_score: Math.min(100, velocityChange * 5),
description: `Engagement velocity changed dramatically (${recent.score_velocity.toFixed(1)} vs avg ${avgVelocity.toFixed(1)})`,
metric_name: 'score_velocity',
expected_value: avgVelocity,
actual_value: recent.score_velocity,
deviation_magnitude: velocityChange,
detection_method: 'velocity_change'
});
}
// 3. Trend reversal
const recentTrend = recent.score_velocity > 0 ? 'up' : 'down';
const historicalTrend = avgVelocity > 0 ? 'up' : 'down';
if (recentTrend !== historicalTrend && Math.abs(recent.score_velocity) > 2) {
anomalies.push({
anomaly_type: 'temporal',
anomaly_category: 'trend_reversal',
severity: 'medium',
anomaly_score: 60,
description: `Engagement trend reversed from ${historicalTrend} to ${recentTrend}`,
metric_name: 'score_velocity',
expected_value: avgVelocity,
actual_value: recent.score_velocity,
deviation_magnitude: Math.abs(recent.score_velocity - avgVelocity),
detection_method: 'trend_reversal'
});
}
return anomalies;
}
async detectBehavioralAnomalies(familyId) {
const anomalies = [];
// 1. Unusual timing anomaly
const recentLogins = await this.db.query(`
SELECT
HOUR(interaction_timestamp) as login_hour,
DAYOFWEEK(interaction_timestamp) as day_of_week
FROM interaction_log
WHERE family_id = ?
AND interaction_type = 'portal_login'
AND interaction_timestamp >= DATE_SUB(NOW(), INTERVAL 30 DAY)
ORDER BY interaction_timestamp DESC
LIMIT 10
`, [familyId]);
// Check for unusual hours (11pm-5am)
const nightLogins = recentLogins.filter(l => l.login_hour >= 23 || l.login_hour <= 5);
if (nightLogins.length >= 3) {
anomalies.push({
anomaly_type: 'behavioral',
anomaly_category: 'unusual_timing',
severity: 'medium',
anomaly_score: 65,
description: `${nightLogins.length} logins during unusual hours (11pm-5am)`,
metric_name: 'login_timing',
expected_value: null,
actual_value: nightLogins.length,
deviation_magnitude: nightLogins.length,
detection_method: 'timing_analysis'
});
}
// 2. Communication breakdown
const recentEmails = await this.db.query(`
SELECT
outcome,
COUNT(*) as count
FROM interaction_log
WHERE family_id = ?
AND interaction_type = 'email_sent'
AND interaction_timestamp >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY outcome
`, [familyId]);
const bounced = recentEmails.find(e => e.outcome === 'bounced')?.count || 0;
const sent = recentEmails.reduce((sum, e) => sum + e.count, 0);
if (bounced >= 3 || (sent > 0 && bounced / sent > 0.5)) {
anomalies.push({
anomaly_type: 'behavioral',
anomaly_category: 'communication_breakdown',
severity: 'high',
anomaly_score: 85,
description: `${bounced} bounced emails - communication channel broken`,
metric_name: 'email_deliverability',
expected_value: 0,
actual_value: bounced,
deviation_magnitude: bounced,
detection_method: 'communication_analysis'
});
}
// 3. Activity spike
const recentActivity = await this.db.query(`
SELECT COUNT(*) as activity_count
FROM interaction_log
WHERE family_id = ?
AND interaction_timestamp >= DATE_SUB(NOW(), INTERVAL 7 DAY)
`, [familyId]);
const historicalActivity = await this.db.query(`
SELECT AVG(weekly_count) as avg_weekly
FROM (
SELECT WEEK(interaction_timestamp) as week, COUNT(*) as weekly_count
FROM interaction_log
WHERE family_id = ?
AND interaction_timestamp >= DATE_SUB(NOW(), INTERVAL 90 DAY)
AND interaction_timestamp < DATE_SUB(NOW(), INTERVAL 7 DAY)
GROUP BY WEEK(interaction_timestamp)
) subq
`, [familyId]);
const recentCount = recentActivity[0].activity_count;
const avgCount = historicalActivity[0]?.avg_weekly || 0;
if (avgCount > 0 && recentCount > (avgCount * 3)) {
anomalies.push({
anomaly_type: 'behavioral',
anomaly_category: 'activity_spike',
severity: 'low', // Often positive!
anomaly_score: 70,
description: `Activity spiked to ${recentCount} (3× normal)`,
metric_name: 'weekly_activity',
expected_value: avgCount,
actual_value: recentCount,
deviation_magnitude: recentCount / avgCount,
detection_method: 'activity_analysis'
});
}
return anomalies;
}
async detectCohortAnomalies(familyId) {
const anomalies = [];
// Get family's cohort
const cohortMembership = await this.db.query(`
SELECT cohort_id
FROM cohort_membership
WHERE family_id = ?
ORDER BY assignment_date DESC
LIMIT 1
`, [familyId]);
if (!cohortMembership.length) return anomalies;
const cohortId = cohortMembership[0].cohort_id;
// Get family's metrics
const familyMetrics = await this.db.query(`
SELECT engagement_score, participation_score, communication_score
FROM family_engagement_metrics
WHERE family_id = ?
`, [familyId]);
if (!familyMetrics.length) return anomalies;
// Get cohort baseline
const cohortBaseline = await this.db.query(`
SELECT metric_name, mean_value, std_dev
FROM baseline_statistics
WHERE cohort_id = ?
AND calculation_date = (SELECT MAX(calculation_date) FROM baseline_statistics WHERE cohort_id = ?)
`, [cohortId, cohortId]);
// Compare to cohort baseline
for (const [metric, value] of Object.entries(familyMetrics[0])) {
const baseline = cohortBaseline.find(b => b.metric_name === metric);
if (!baseline || !baseline.std_dev) continue;
const zScore = Math.abs((value - baseline.mean_value) / baseline.std_dev);
if (zScore > 2.5) { // Slightly lower threshold for cohort comparison
anomalies.push({
anomaly_type: 'cohort_relative',
anomaly_category: `${metric}_cohort_outlier`,
severity: this.categorizeSeverity(zScore, 2.5, 3.5, 4.5),
anomaly_score: Math.min(100, zScore * 25),
description: `${metric} deviates from cohort norm by ${zScore.toFixed(1)} SD`,
metric_name: metric,
expected_value: baseline.mean_value,
actual_value: value,
deviation_magnitude: zScore,
detection_method: 'cohort_comparison'
});
}
}
return anomalies;
}
categorizeSeverity(magnitude, mediumThreshold, highThreshold, criticalThreshold) {
if (magnitude >= criticalThreshold) return 'critical';
if (magnitude >= highThreshold) return 'high';
if (magnitude >= mediumThreshold) return 'medium';
return 'low';
}
async saveAnomaly(familyId, anomaly) {
// Check if similar anomaly already exists (avoid duplicates)
const existing = await this.db.query(`
SELECT anomaly_id
FROM anomalies
WHERE family_id = ?
AND anomaly_category = ?
AND status IN ('new', 'investigating')
AND detected_date >= DATE_SUB(NOW(), INTERVAL 7 DAY)
`, [familyId, anomaly.anomaly_category]);
if (existing.length > 0) {
// Update existing anomaly
await this.db.query(`
UPDATE anomalies
SET
actual_value = ?,
deviation_magnitude = ?,
anomaly_score = ?,
detected_date = NOW()
WHERE anomaly_id = ?
`, [
anomaly.actual_value,
anomaly.deviation_magnitude,
anomaly.anomaly_score,
existing[0].anomaly_id
]);
} else {
// Insert new anomaly
await this.db.query(`
INSERT INTO anomalies (
family_id, anomaly_type, anomaly_category, severity, anomaly_score,
description, metric_name, expected_value, actual_value,
deviation_magnitude, detection_method, baseline_period_days
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 90)
`, [
familyId,
anomaly.anomaly_type,
anomaly.anomaly_category,
anomaly.severity,
anomaly.anomaly_score,
anomaly.description,
anomaly.metric_name,
anomaly.expected_value,
anomaly.actual_value,
anomaly.deviation_magnitude,
anomaly.detection_method
]);
}
}
async getAllActiveFamilies() {
return await this.db.query(`
SELECT family_id FROM families WHERE enrolled_current_semester = 1
`);
}
async getAnomalyDashboard() {
return await this.db.query(`
SELECT
a.anomaly_id,
a.family_id,
f.family_name,
a.anomaly_category,
a.severity,
a.anomaly_score,
a.description,
a.detected_date,
a.status
FROM anomalies a
JOIN families f ON a.family_id = f.family_id
WHERE a.status IN ('new', 'investigating')
ORDER BY
CASE a.severity
WHEN 'critical' THEN 1
WHEN 'high' THEN 2
WHEN 'medium' THEN 3
ELSE 4
END,
a.anomaly_score DESC,
a.detected_date DESC
LIMIT 50
`);
}
}
module.exports = AnomalyDetectionEngine;
ML-Based Anomaly Detection (Python)
# anomaly_detection_ml.py
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor
import numpy as np
import pandas as pd
class MLAnomalyDetector:
def __init__(self, contamination=0.05):
"""
contamination: expected proportion of outliers (5% = 0.05)
"""
self.contamination = contamination
self.isolation_forest = IsolationForest(
contamination=contamination,
random_state=42
)
self.lof = LocalOutlierFactor(
contamination=contamination
)
def detect_isolation_forest(self, X):
"""
Isolation Forest: isolates anomalies in feature space
Good for high-dimensional data
"""
# Fit and predict
predictions = self.isolation_forest.fit_predict(X)
scores = self.isolation_forest.score_samples(X)
# -1 = anomaly, 1 = normal
anomalies = predictions == -1
return {
'is_anomaly': anomalies,
'anomaly_score': -scores, # Negate so higher = more anomalous
'method': 'isolation_forest'
}
def detect_local_outlier_factor(self, X):
"""
LOF: finds points in low-density regions
Good for density-based anomalies
"""
predictions = self.lof.fit_predict(X)
scores = self.lof.negative_outlier_factor_
anomalies = predictions == -1
return {
'is_anomaly': anomalies,
'anomaly_score': -scores,
'method': 'local_outlier_factor'
}
def detect_ensemble(self, X):
"""
Combine multiple methods for robust detection
"""
if_result = self.detect_isolation_forest(X)
lof_result = self.detect_local_outlier_factor(X)
# Anomaly if EITHER method flags it
ensemble_anomalies = if_result['is_anomaly'] | lof_result['is_anomaly']
# Average scores
ensemble_scores = (if_result['anomaly_score'] + lof_result['anomaly_score']) / 2
return {
'is_anomaly': ensemble_anomalies,
'anomaly_score': ensemble_scores,
'if_flagged': if_result['is_anomaly'],
'lof_flagged': lof_result['is_anomaly'],
'method': 'ensemble'
}
Usage Example
const detector = new AnomalyDetectionEngine(db);
// Detect anomalies for all families
const anomalies = await detector.detectAnomalies();
console.log(`\n=== DETECTED ${anomalies.length} ANOMALIES ===\n`);
// Get anomaly dashboard
const dashboard = await detector.getAnomalyDashboard();
dashboard.forEach(anomaly => {
console.log(`[${anomaly.severity.toUpperCase()}] ${anomaly.family_name}`);
console.log(` Category: ${anomaly.anomaly_category}`);
console.log(` Score: ${anomaly.anomaly_score.toFixed(1)}/100`);
console.log(` ${anomaly.description}`);
console.log(` Detected: ${anomaly.detected_date.toLocaleDateString()}`);
console.log(``);
});
// Example output:
// === DETECTED 23 ANOMALIES ===
//
// [CRITICAL] Martinez Family
// Category: engagement_sudden_drop
// Score: 87.3/100
// Engagement dropped 29.1 points
// Detected: 12/23/2025
//
// [HIGH] Johnson Family
// Category: communication_breakdown
// Score: 85.0/100
// 4 bounced emails - communication channel broken
// Detected: 12/22/2025
//
// [MEDIUM] Chen Family
// Category: unusual_timing
// Score: 65.0/100
// 5 logins during unusual hours (11pm-5am)
// Detected: 12/21/2025
//
// [LOW] Williams Family
// Category: activity_spike
// Score: 70.0/100
// Activity spiked to 45 (3× normal)
// Detected: 12/23/2025
Variations
By Detection Method
Statistical (Z-score, IQR): - Simple, fast, interpretable - Requires normal distribution - Good for univariate outliers
Temporal (Change detection): - Finds sudden shifts - Requires time-series data - Good for monitoring
ML-Based (Isolation Forest, LOF): - Handles high dimensions - Finds complex patterns - Requires more data
Rule-Based (Domain logic): - Custom business rules - Very interpretable - Requires domain expertise
By Response Time
Real-Time: - Detect on every event - Immediate alerts - High computational cost
Batch (Hourly/Daily): - Scheduled detection - Lower cost - Slight delay acceptable
On-Demand: - Run when requested - Interactive exploration - Manual investigation
By Anomaly Type
Negative Anomalies: - Problems requiring attention - Engagement drops, communication breaks - High priority
Positive Anomalies: - Unexpected successes - Learning opportunities - Lower priority but valuable
Neutral Anomalies: - Just unusual, not good/bad - Context-dependent - Investigate if pattern emerges
Consequences
Benefits
1. Early warning Catch Family 142's 3am logins before withdrawal.
2. Proactive intervention Address communication breakdown (bounced emails) immediately.
3. Positive discovery Find Family 203's spike (12 volunteer hours!) - learn what changed.
4. Prioritization Focus on critical/high severity anomalies first.
5. False positive learning Track investigations, improve detection over time.
6. Cohort refinement Anomalies reveal when families don't fit cohort.
Costs
1. False positive burden Investigating anomalies that aren't meaningful.
2. Threshold tuning Finding right balance (sensitivity vs specificity).
3. Computational overhead Running detection algorithms regularly.
4. Alert fatigue Too many alerts = ignored alerts.
5. Baseline maintenance Must update baselines as population evolves.
6. Investigation capacity Need staff to investigate flagged anomalies.
Sample Code
Calculate baseline statistics:
async function calculateBaselineStatistics(cohortId = null) {
const metrics = [
'engagement_score',
'communication_score',
'platform_engagement_score',
'participation_score'
];
for (const metric of metrics) {
const stats = await db.query(`
SELECT
AVG(${metric}) as mean_value,
STDDEV(${metric}) as std_dev,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ${metric}) as median_value,
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY ${metric}) as q1_value,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY ${metric}) as q3_value,
MIN(${metric}) as min_value,
MAX(${metric}) as max_value,
COUNT(*) as sample_size
FROM family_engagement_metrics fem
${cohortId ? `
JOIN cohort_membership cm ON fem.family_id = cm.family_id
WHERE cm.cohort_id = ?
` : ''}
`, cohortId ? [cohortId] : []);
await db.query(`
INSERT INTO baseline_statistics (
metric_name, cohort_id, mean_value, std_dev, median_value,
q1_value, q3_value, min_value, max_value, calculation_date,
period_days, sample_size
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, CURRENT_DATE, 90, ?)
`, [
metric,
cohortId,
stats[0].mean_value,
stats[0].std_dev,
stats[0].median_value,
stats[0].q1_value,
stats[0].q3_value,
stats[0].min_value,
stats[0].max_value,
stats[0].sample_size
]);
}
}
Known Uses
Homeschool Co-op Intelligence Platform - Detects 15-25 anomalies weekly - 60% true positives (valuable findings) - Caught 8 communication breakdowns before crisis - Discovered 3 positive outliers (learned new engagement tactics)
Fraud Detection - Credit card anomaly detection standard practice - Real-time transaction monitoring - Catches 70-90% of fraud
Network Security - Intrusion detection systems - Unusual traffic patterns - DDoS attack detection
Manufacturing - Statistical Process Control (SPC) - Equipment anomaly detection - Quality control
Healthcare - Vital sign monitoring - Lab result outliers - Disease outbreak detection
Related Patterns
Requires: - Pattern 1: Universal Event Log - behavioral data for detection - Pattern 16: Cohort Discovery - cohort baselines for comparison
Enables: - Pattern 15: Intervention Recommendation - anomalies trigger interventions - Pattern 18: Opportunity Mining - positive anomalies = opportunities - Pattern 22: Progressive Escalation - critical anomalies escalate
Enhanced by: - Pattern 10: Engagement Velocity - temporal anomaly detection - Pattern 13: Confidence Scoring - confidence in anomaly detection
References
Academic Foundations
- Chandola, Varun, Arindam Banerjee, and Vipin Kumar (2009). "Anomaly Detection: A Survey." ACM Computing Surveys 41(3). https://dl.acm.org/doi/10.1145/1541880.1541882 - Comprehensive survey
- Aggarwal, Charu C. (2017). Outlier Analysis (2nd ed.). Springer. ISBN: 978-3319475776
- Hodge, Victoria, and Jim Austin (2004). "A Survey of Outlier Detection Methodologies." Artificial Intelligence Review 22(2): 85-126.
Specific Algorithms
- Isolation Forest: Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou (2008). "Isolation Forest." ICDM 2008. https://ieeexplore.ieee.org/document/4781136
- LOF (Local Outlier Factor): Breunig, Markus M., et al. (2000). "LOF: Identifying Density-Based Local Outliers." SIGMOD 2000. https://dl.acm.org/doi/10.1145/342009.335388
- One-Class SVM: Schölkopf, B., et al. (2001). "Estimating the Support of a High-Dimensional Distribution." Neural Computation 13(7): 1443-1471.
- Autoencoders for Anomalies: Hawkins, S., et al. (2002). "Outlier Detection Using Replicator Neural Networks." DaWaK 2002.
Statistical Process Control
- Deming, W. Edwards (1986). Out of the Crisis. MIT Press. ISBN: 978-0262541152 - Quality control foundations
- Shewhart, Walter A. (1931). Economic Control of Quality of Manufactured Product. Van Nostrand. - Control charts
- Montgomery, Douglas C. (2012). Introduction to Statistical Quality Control (7th ed.). Wiley. ISBN: 978-1118146811
Practical Implementation
- PyOD: https://github.com/yzhao062/pyod - Python Outlier Detection (30+ algorithms)
- Scikit-learn Outlier Detection: https://scikit-learn.org/stable/modules/outlier_detection.html
- Keras Autoencoders: https://blog.keras.io/building-autoencoders-in-keras.html - Anomaly detection tutorial
- Facebook Prophet: https://facebook.github.io/prophet/docs/outliers.html - Time series outliers
Related Trilogy Patterns
- Pattern 9: Early Warning Signals - Anomalies trigger warnings
- Pattern 16: Cohort Discovery & Analysis - Discover anomalous patterns
- Pattern 23: Triggered Interventions - Anomalies trigger investigation workflows
- Volume 3, Pattern 6: Domain-Aware Validation - Domain rules flag anomalies
Tools & Services
- Datadog Anomaly Detection: https://www.datadoghq.com/blog/anomaly-detection/ - Infrastructure anomalies
- Anodot: https://www.anodot.com/ - Business metrics anomaly detection
- Amazon Lookout for Metrics: https://aws.amazon.com/lookout-for-metrics/ - ML-powered anomaly detection
- Elastic Machine Learning: https://www.elastic.co/what-is/elasticsearch-machine-learning - Log anomaly detection