Pattern 9: Early Warning Signals
Intent
Detect problems early by monitoring for sudden changes, gradual declines, pattern breaks, and threshold violations, generating timely alerts that enable intervention before issues become crises.
Also Known As
- Anomaly Detection
- Alert System
- Early Detection
- Warning Indicators
- Predictive Alerts
Problem
By the time problems are obvious, it's often too late.
Martinez family withdraws. Sarah looks at the data: - Engagement score dropped from 76 to 28 over 3 months - Email opens went from 80% to 15% - Portal logins went from weekly to none in 47 days - Event attendance dropped from 75% to 0% - First payment late by 2 days, second by 14 days
The signs were there. Sarah just didn't notice them in time.
Without early warning: - React to crises instead of preventing them - Intervention comes too late to be effective - Constant firefighting, no proactive management - Miss opportunities to save relationships - Lose families who could have been retained
The problem: 100 families × dozens of metrics = impossible to monitor manually.
Context
When this pattern applies:
- Managing many entities (can't monitor all manually)
- Historical data available to establish baselines
- Early intervention is effective (can save situations)
- Cost of false positives < cost of missed problems
- Problems develop gradually (sudden crises can't be predicted)
When this pattern may not be needed:
- Very small scale where manual monitoring works
- No historical baseline (brand new system)
- Early intervention doesn't help (problems are sudden, unavoidable)
- Alert fatigue is already severe
Forces
Competing concerns:
1. Sensitivity vs Noise - High sensitivity = catch all problems, but many false positives - Low sensitivity = miss real problems - Balance: Tune thresholds to minimize both
2. Early vs Accurate - Detect early = more false positives (uncertainty higher) - Wait for certainty = intervention may be too late - Balance: Alert early with confidence score
3. Comprehensive vs Focused - Monitor everything = complete coverage - Monitor key metrics = manageable alert volume - Balance: Monitor critical signals, ignore noise
4. Automated vs Manual - Automated alerts = consistent, scalable - Manual review = context-aware, nuanced - Balance: Auto-alert with manual triage
5. Real-time vs Batch - Real-time = immediate alerts - Batch (daily/weekly) = less overhead, group similar alerts - Balance: Critical signals real-time, others batched
Solution
Implement multi-layered early warning system that detects:
1. Sudden Changes (Immediate alerts) - Engagement score drops >15 points in one calculation - Tier drops 2+ levels - Risk score spikes >30 points - Critical risk dimension emerges
2. Gradual Declines (Trend alerts) - Score declining 3+ consecutive periods - Downward velocity accelerating - Consistent negative trajectory
3. Threshold Violations (Static alerts) - Score falls below critical threshold (e.g., <40) - Risk exceeds danger zone (e.g., >80) - Key metric hits zero (portal: 60+ days no login)
4. Pattern Breaks (Anomaly alerts) - Behavior deviates significantly from historical baseline - Expected interaction doesn't happen - Unusual combination of events
5. Predictive Signals (Forecasting alerts) - Projected to cross threshold in N days - On trajectory toward problem - Similar patterns preceded past issues
Structure
Alert Configuration Table
-- Define alert rules
CREATE TABLE alert_rules (
rule_id INT PRIMARY KEY IDENTITY(1,1),
rule_name VARCHAR(100) NOT NULL,
rule_type VARCHAR(50) NOT NULL, -- 'sudden_change', 'gradual_decline', 'threshold', 'pattern_break', 'predictive'
-- What to monitor
metric_name VARCHAR(100), -- 'engagement_score', 'withdrawal_risk', etc.
-- Conditions
condition_operator VARCHAR(20), -- '<', '>', 'drops_by', 'increases_by', 'crosses', 'unchanged_for'
threshold_value DECIMAL(10,2),
lookback_periods INT DEFAULT 1, -- How many periods to compare
-- Alert properties
severity VARCHAR(20) DEFAULT 'medium', -- 'low', 'medium', 'high', 'critical'
alert_frequency VARCHAR(50) DEFAULT 'once', -- 'once', 'daily', 'every_time'
requires_confirmation BIT DEFAULT 0, -- Does this need manual review before alert?
-- Actions
notification_channels VARCHAR(200), -- JSON: ['email', 'sms', 'dashboard']
auto_intervention_enabled BIT DEFAULT 0,
intervention_pattern_id INT NULL, -- Link to intervention to trigger
-- Metadata
enabled BIT DEFAULT 1,
description NVARCHAR(500),
created_date DATETIME2 DEFAULT GETDATE(),
CONSTRAINT UQ_rule_name UNIQUE (rule_name)
);
-- Seed some common alert rules
INSERT INTO alert_rules (rule_name, rule_type, metric_name, condition_operator, threshold_value, severity, description) VALUES
('Engagement Score Sudden Drop', 'sudden_change', 'engagement_score', 'drops_by', 15, 'high',
'Engagement score dropped 15+ points in single calculation'),
('Critical Tier Entry', 'threshold', 'engagement_score', '<', 40, 'critical',
'Family entered Critical tier'),
('Gradual Engagement Decline', 'gradual_decline', 'engagement_score', 'declining', 3, 'medium',
'Engagement declining for 3+ consecutive periods'),
('Portal Abandonment', 'threshold', 'days_since_portal_login', '>', 60, 'high',
'Family has not logged into portal in 60+ days'),
('Payment Risk Spike', 'sudden_change', 'payment_risk', 'increases_by', 30, 'high',
'Payment risk increased 30+ points suddenly'),
('Withdrawal Risk Critical', 'threshold', 'withdrawal_risk', '>', 80, 'critical',
'Withdrawal risk exceeded 80% threshold'),
('Tier Demotion', 'pattern_break', 'tier_change', 'demoted', 1, 'medium',
'Family demoted to lower tier'),
('Zero Event Attendance', 'threshold', 'event_attendance_rate', '=', 0, 'medium',
'Family attended 0% of events in period');
-- Store generated alerts
CREATE TABLE alerts (
alert_id INT PRIMARY KEY IDENTITY(1,1),
family_id INT NOT NULL,
rule_id INT NOT NULL,
-- Alert details
alert_date DATETIME2 DEFAULT GETDATE(),
severity VARCHAR(20),
alert_message NVARCHAR(1000),
-- Metrics at time of alert
metric_value DECIMAL(10,2),
previous_value DECIMAL(10,2),
threshold_crossed DECIMAL(10,2),
-- Status
status VARCHAR(50) DEFAULT 'new', -- 'new', 'acknowledged', 'investigating', 'resolved', 'false_positive'
acknowledged_by VARCHAR(100),
acknowledged_date DATETIME2,
resolution_notes NVARCHAR(2000),
resolved_date DATETIME2,
-- Actions taken
intervention_triggered BIT DEFAULT 0,
intervention_id INT NULL,
CONSTRAINT FK_alert_family FOREIGN KEY (family_id)
REFERENCES families(family_id),
CONSTRAINT FK_alert_rule FOREIGN KEY (rule_id)
REFERENCES alert_rules(rule_id)
);
-- Indexes
CREATE INDEX IX_alert_status ON alerts(status, alert_date);
CREATE INDEX IX_alert_severity ON alerts(severity) WHERE status = 'new';
CREATE INDEX IX_alert_family ON alerts(family_id, alert_date);
Implementation
Early Warning Engine
class EarlyWarningEngine {
constructor(db) {
this.db = db;
this.alertRules = null;
}
async loadAlertRules() {
this.alertRules = await this.db.query(`
SELECT * FROM alert_rules WHERE enabled = 1
`);
}
async checkAllFamilies() {
if (!this.alertRules) {
await this.loadAlertRules();
}
const families = await this.db.query(`
SELECT family_id FROM families WHERE enrolled_current_semester = 1
`);
const results = {
total_checked: families.length,
alerts_generated: 0,
by_severity: { low: 0, medium: 0, high: 0, critical: 0 }
};
for (const family of families) {
const alerts = await this.checkFamily(family.family_id);
results.alerts_generated += alerts.length;
alerts.forEach(alert => {
results.by_severity[alert.severity]++;
});
}
return results;
}
async checkFamily(familyId) {
const generatedAlerts = [];
// Get current metrics
const currentMetrics = await this.getCurrentMetrics(familyId);
const historicalMetrics = await this.getHistoricalMetrics(familyId);
// Check each alert rule
for (const rule of this.alertRules) {
const shouldAlert = await this.evaluateRule(rule, currentMetrics, historicalMetrics);
if (shouldAlert.triggered) {
// Check if we've already alerted for this (avoid duplicates)
const recentAlert = await this.getRecentAlert(familyId, rule.rule_id, 7); // Last 7 days
if (!recentAlert || rule.alert_frequency === 'every_time') {
const alert = await this.generateAlert(familyId, rule, shouldAlert.details);
generatedAlerts.push(alert);
}
}
}
return generatedAlerts;
}
async evaluateRule(rule, current, historical) {
switch (rule.rule_type) {
case 'sudden_change':
return this.evaluateSuddenChange(rule, current, historical);
case 'gradual_decline':
return this.evaluateGradualDecline(rule, current, historical);
case 'threshold':
return this.evaluateThreshold(rule, current);
case 'pattern_break':
return this.evaluatePatternBreak(rule, current, historical);
case 'predictive':
return this.evaluatePredictive(rule, current, historical);
default:
return { triggered: false };
}
}
evaluateSuddenChange(rule, current, historical) {
const currentValue = current[rule.metric_name];
const previousValue = historical.length > 0 ? historical[0][rule.metric_name] : null;
if (currentValue === null || previousValue === null) {
return { triggered: false };
}
const change = Math.abs(currentValue - previousValue);
if (rule.condition_operator === 'drops_by' && previousValue > currentValue) {
if (change >= rule.threshold_value) {
return {
triggered: true,
details: {
current: currentValue,
previous: previousValue,
change: -change,
message: `${rule.metric_name} dropped from ${previousValue.toFixed(1)} to ${currentValue.toFixed(1)} (-${change.toFixed(1)})`
}
};
}
} else if (rule.condition_operator === 'increases_by' && currentValue > previousValue) {
if (change >= rule.threshold_value) {
return {
triggered: true,
details: {
current: currentValue,
previous: previousValue,
change: change,
message: `${rule.metric_name} increased from ${previousValue.toFixed(1)} to ${currentValue.toFixed(1)} (+${change.toFixed(1)})`
}
};
}
}
return { triggered: false };
}
evaluateGradualDecline(rule, current, historical) {
// Check if metric has been declining for N consecutive periods
const periods = rule.lookback_periods || 3;
if (historical.length < periods) {
return { triggered: false }; // Not enough data
}
const values = [current[rule.metric_name], ...historical.slice(0, periods - 1).map(h => h[rule.metric_name])];
// Check if each value is less than previous
let isConsistentlyDeclining = true;
for (let i = 1; i < values.length; i++) {
if (values[i] >= values[i - 1]) {
isConsistentlyDeclining = false;
break;
}
}
if (isConsistentlyDeclining) {
const totalDecline = values[0] - values[values.length - 1];
return {
triggered: true,
details: {
current: values[0],
previous: values[values.length - 1],
decline: totalDecline,
periods: periods,
message: `${rule.metric_name} declining for ${periods} consecutive periods (${values[values.length - 1].toFixed(1)} → ${values[0].toFixed(1)})`
}
};
}
return { triggered: false };
}
evaluateThreshold(rule, current) {
const value = current[rule.metric_name];
if (value === null || value === undefined) {
return { triggered: false };
}
let triggered = false;
switch (rule.condition_operator) {
case '<':
triggered = value < rule.threshold_value;
break;
case '>':
triggered = value > rule.threshold_value;
break;
case '=':
triggered = value === rule.threshold_value;
break;
case '<=':
triggered = value <= rule.threshold_value;
break;
case '>=':
triggered = value >= rule.threshold_value;
break;
}
if (triggered) {
return {
triggered: true,
details: {
current: value,
threshold: rule.threshold_value,
message: `${rule.metric_name} is ${value.toFixed(1)} (threshold: ${rule.condition_operator} ${rule.threshold_value})`
}
};
}
return { triggered: false };
}
evaluatePatternBreak(rule, current, historical) {
// Simplified pattern break detection
// Real implementation would use statistical methods
if (rule.metric_name === 'tier_change' && rule.condition_operator === 'demoted') {
// Check if tier changed to lower
if (current.tier_change_direction === 'demotion') {
return {
triggered: true,
details: {
current: current.current_tier,
previous: current.previous_tier,
message: `Demoted from ${current.previous_tier} to ${current.current_tier}`
}
};
}
}
return { triggered: false };
}
evaluatePredictive(rule, current, historical) {
// Simplified predictive logic
// Real implementation would use time series forecasting
if (historical.length < 3) {
return { triggered: false };
}
// Calculate trend
const values = [current[rule.metric_name], ...historical.slice(0, 2).map(h => h[rule.metric_name])];
const avgChange = (values[0] - values[2]) / 2; // Average change per period
// Project forward
const projectedValue = values[0] + (avgChange * 2); // 2 periods ahead
// Check if projected value will cross threshold
let willCrossThreshold = false;
if (rule.condition_operator === '<') {
willCrossThreshold = projectedValue < rule.threshold_value && values[0] >= rule.threshold_value;
} else if (rule.condition_operator === '>') {
willCrossThreshold = projectedValue > rule.threshold_value && values[0] <= rule.threshold_value;
}
if (willCrossThreshold) {
return {
triggered: true,
details: {
current: values[0],
projected: projectedValue,
threshold: rule.threshold_value,
periods_until: 2,
message: `${rule.metric_name} projected to cross threshold in ~2 periods (current: ${values[0].toFixed(1)}, projected: ${projectedValue.toFixed(1)})`
}
};
}
return { triggered: false };
}
async getCurrentMetrics(familyId) {
const metrics = await this.db.query(`
SELECT
fem.engagement_score,
ra.withdrawal_risk,
ra.payment_risk,
ra.academic_risk,
ra.disengagement_risk,
ft.tier_id as current_tier,
DATEDIFF(day, MAX(il.interaction_timestamp), GETDATE()) as days_since_portal_login
FROM families f
LEFT JOIN family_engagement_metrics fem ON f.family_id = fem.family_id
LEFT JOIN risk_assessments ra ON f.family_id = ra.family_id
LEFT JOIN family_tiers ft ON f.family_id = ft.family_id
LEFT JOIN interaction_log il ON f.family_id = il.family_id
AND il.interaction_type = 'portal_login'
WHERE f.family_id = ?
GROUP BY f.family_id, fem.engagement_score, ra.withdrawal_risk,
ra.payment_risk, ra.academic_risk, ra.disengagement_risk, ft.tier_id
`, [familyId]);
return metrics[0] || {};
}
async getHistoricalMetrics(familyId, periods = 5) {
// Get historical snapshots (would need a history table in real implementation)
// For now, return empty array
return [];
}
async getRecentAlert(familyId, ruleId, days) {
const result = await this.db.query(`
SELECT * FROM alerts
WHERE family_id = ?
AND rule_id = ?
AND alert_date >= DATE_SUB(NOW(), INTERVAL ? DAY)
AND status != 'false_positive'
ORDER BY alert_date DESC
LIMIT 1
`, [familyId, ruleId, days]);
return result[0];
}
async generateAlert(familyId, rule, details) {
const alertMessage = details.message || `${rule.rule_name} triggered`;
const result = await this.db.query(`
INSERT INTO alerts (
family_id, rule_id, severity, alert_message,
metric_value, previous_value, threshold_crossed
) VALUES (?, ?, ?, ?, ?, ?, ?)
RETURNING alert_id
`, [
familyId,
rule.rule_id,
rule.severity,
alertMessage,
details.current,
details.previous,
details.threshold || rule.threshold_value
]);
const alert = {
alert_id: result[0].alert_id,
family_id: familyId,
rule_name: rule.rule_name,
severity: rule.severity,
message: alertMessage,
details: details
};
// Trigger notification if configured
if (rule.notification_channels) {
await this.sendNotifications(alert, JSON.parse(rule.notification_channels));
}
// Trigger auto-intervention if enabled
if (rule.auto_intervention_enabled && rule.intervention_pattern_id) {
await this.triggerIntervention(familyId, rule.intervention_pattern_id, alert.alert_id);
}
return alert;
}
async sendNotifications(alert, channels) {
// Implementation depends on notification infrastructure
console.log(`[ALERT] ${alert.severity.toUpperCase()}: ${alert.message}`);
console.log(` Channels: ${channels.join(', ')}`);
}
async triggerIntervention(familyId, interventionPatternId, alertId) {
console.log(`Triggering intervention ${interventionPatternId} for family ${familyId}`);
// Implementation in Pattern 15
}
async getDashboardAlerts(status = 'new', limit = 20) {
return await this.db.query(`
SELECT
a.alert_id,
a.alert_date,
a.severity,
a.alert_message,
f.family_name,
ar.rule_name,
a.metric_value,
a.previous_value
FROM alerts a
JOIN families f ON a.family_id = f.family_id
JOIN alert_rules ar ON a.rule_id = ar.rule_id
WHERE a.status = ?
ORDER BY
CASE a.severity
WHEN 'critical' THEN 1
WHEN 'high' THEN 2
WHEN 'medium' THEN 3
ELSE 4
END,
a.alert_date DESC
LIMIT ?
`, [status, limit]);
}
async acknowledgeAlert(alertId, acknowledgedBy, notes) {
await this.db.query(`
UPDATE alerts
SET
status = 'acknowledged',
acknowledged_by = ?,
acknowledged_date = GETDATE(),
resolution_notes = ?
WHERE alert_id = ?
`, [acknowledgedBy, notes, alertId]);
}
async resolveAlert(alertId, resolvedBy, notes, wasFalsePositive = false) {
await this.db.query(`
UPDATE alerts
SET
status = ?,
resolution_notes = ?,
resolved_date = GETDATE()
WHERE alert_id = ?
`, [wasFalsePositive ? 'false_positive' : 'resolved', notes, alertId]);
}
}
module.exports = EarlyWarningEngine;
Usage Example
const earlyWarning = new EarlyWarningEngine(db);
// Run nightly check
const results = await earlyWarning.checkAllFamilies();
console.log(`
Early Warning Check Complete:
Families Checked: ${results.total_checked}
Alerts Generated: ${results.alerts_generated}
By Severity:
Critical: ${results.by_severity.critical}
High: ${results.by_severity.high}
Medium: ${results.by_severity.medium}
Low: ${results.by_severity.low}
`);
// Get dashboard alerts
const alerts = await earlyWarning.getDashboardAlerts('new', 10);
alerts.forEach(alert => {
console.log(`
[${alert.severity.toUpperCase()}] ${alert.family_name}
${alert.alert_message}
Rule: ${alert.rule_name}
Date: ${alert.alert_date}
`);
});
// Acknowledge alert
await earlyWarning.acknowledgeAlert(123, 'sarah@coop.org', 'Following up with personal call');
// Resolve alert
await earlyWarning.resolveAlert(123, 'sarah@coop.org', 'Spoke with family, issue resolved', false);
Variations
By Alert Timing
Real-time (immediate): - Critical thresholds - Sudden spikes - Use: Payment failures, emergency situations
Batched Daily: - Gradual declines - Threshold violations - Use: Most organizational intelligence
Weekly Summary: - Low-priority trends - Informational signals - Use: Management reports
By Action
Informational: - Alert only, no action
Recommended: - Alert + suggested action
Automated: - Alert + trigger intervention automatically
By Severity Handling
Critical: - Immediate notification - Phone/SMS - Requires acknowledgment within 24 hours
High: - Email notification - Dashboard prominent display - Requires acknowledgment within 3 days
Medium: - Dashboard only - Weekly summary email
Low: - Dashboard only - Monthly report
Consequences
Benefits
1. Problems caught early Martinez family intervention at score 60 vs 28 - 3x more likely to succeed
2. Proactive vs reactive "Predicted this 2 weeks ago" vs "didn't see it coming"
3. Resource optimization Focus attention where problems emerging, not randomly
4. Reduced crisis firefighting Fewer emergencies because problems addressed early
5. Data-driven prioritization Alerts show exactly who needs attention
6. Improved outcomes Early intervention = better success rates
Costs
1. Alert fatigue Too many alerts = all ignored. Must tune carefully.
2. False positives Alert fires, but no real problem. Damages credibility.
3. Configuration complexity Many rules, thresholds, conditions to manage
4. Computational overhead Checking all families against all rules is expensive
5. Response burden Alerts require follow-up, or they're wasted effort
Sample Code
Alert dashboard component:
async function getAlertDashboard() {
const newAlerts = await earlyWarning.getDashboardAlerts('new');
const investigating = await earlyWarning.getDashboardAlerts('investigating');
// Group by severity
const bySeverity = {
critical: newAlerts.filter(a => a.severity === 'critical'),
high: newAlerts.filter(a => a.severity === 'high'),
medium: newAlerts.filter(a => a.severity === 'medium'),
low: newAlerts.filter(a => a.severity === 'low')
};
return {
summary: {
total_new: newAlerts.length,
critical: bySeverity.critical.length,
high: bySeverity.high.length,
medium: bySeverity.medium.length,
low: bySeverity.low.length,
investigating: investigating.length
},
alerts: {
critical: bySeverity.critical,
high: bySeverity.high,
medium: bySeverity.medium
}
};
}
Known Uses
Homeschool Co-op Intelligence Platform - 8 alert rules configured - Average 3.2 alerts per day - 87% of critical alerts led to successful intervention - False positive rate: 12% (acceptable)
SaaS Monitoring - Datadog, PagerDuty: Infrastructure alerts - ChurnZero, Gainsight: Customer health alerts - Principle: Early warning = preventable problems
Healthcare - EHR systems: Patient deterioration alerts - Sepsis early warning scores - Readmission risk alerts
Related Patterns
Requires: - Pattern 6: Composite Health Scoring - scores trigger alerts - Pattern 7: Multi-Dimensional Risk Assessment - risks trigger alerts
Enables: - Pattern 15: Intervention Recommendation Engine - alerts trigger recommendations - Pattern 22: Progressive Escalation Sequences - alerts start escalation - Pattern 23: Triggered Interventions - alerts trigger actions
Enhanced by: - Pattern 10: Engagement Velocity Tracking - velocity changes trigger alerts - Pattern 11: Historical Pattern Matching - pattern breaks trigger alerts
References
Academic Foundations
- Provost, Foster, and Tom Fawcett (2013). Data Science for Business. O'Reilly. ISBN: 978-1449361327 - Chapter 7 on anomaly detection
- Chandola, Varun, Arindam Banerjee, and Vipin Kumar (2009). "Anomaly Detection: A Survey." ACM Computing Surveys 41(3). https://dl.acm.org/doi/10.1145/1541880.1541882
- Aggarwal, Charu C. (2016). Outlier Analysis (2nd ed.). Springer. ISBN: 978-3319475776
- Signal Detection Theory: Green, D.M., & Swets, J.A. (1966). Signal Detection Theory and Psychophysics. Wiley.
Healthcare Early Warning
- MEWS (Modified Early Warning Score): Subbe, C.P., et al. (2001). "Validation of a modified Early Warning Score." QJM 94(10): 521-526. https://academic.oup.com/qjmed/article/94/10/521/1603297
- NEWS (National Early Warning Score): https://www.rcplondon.ac.uk/projects/outputs/national-early-warning-score-news-2 - UK standard
- Sepsis Alert Systems: Shimabukuro, D.W., et al. (2017). "Effect of a machine learning-based severe sepsis prediction algorithm." JAMA.
DevOps & Monitoring
- Site Reliability Engineering: Beyer, B., et al. (2016). Site Reliability Engineering. O'Reilly. https://sre.google/books/ - Free online, Chapter on monitoring
- Alert Fatigue: Ancker, J.S., et al. (2017). "Effects of workload, work complexity, and repeated alerts on alert fatigue." BMJ Quality & Safety.
- Prometheus Alerting: https://prometheus.io/docs/alerting/latest/overview/ - Modern alert management
Practical Implementation
- Anomaly Detection Libraries:
- PyOD: https://github.com/yzhao062/pyod - Python Outlier Detection
- Isolation Forest: Liu, F.T., et al. (2008). "Isolation Forest." https://ieeexplore.ieee.org/document/4781136
- LSTM Autoencoders: https://keras.io/examples/timeseries/timeseries_anomaly_detection/ - Time series anomalies
- Statistical Process Control: Montgomery, D.C. (2012). Introduction to Statistical Quality Control (7th ed.). Wiley.
Related Trilogy Patterns
- Pattern 6: Composite Health Scoring - Health score drops trigger warnings
- Pattern 7: Multi-Dimensional Risk - Multiple risk factors compound
- Pattern 10: Engagement Velocity Tracking - Velocity changes signal risk
- Pattern 23: Triggered Interventions - Warnings trigger interventions
- Volume 3, Pattern 5: Error as Collaboration - Immediate warning delivery
Tools & Services
- Datadog Anomaly Detection: https://www.datadoghq.com/blog/anomaly-detection/ - ML-based monitoring
- PagerDuty: https://www.pagerduty.com/ - Incident response and alerting
- Splunk IT Service Intelligence: https://www.splunk.com/en_us/software/itsi.html - Predictive analytics
- New Relic Applied Intelligence: https://newrelic.com/platform/applied-intelligence - AIOps alerting