Pattern 11: Historical Pattern Matching
Intent
Identify similar historical behavior patterns and use their outcomes to predict what will happen with current situations, leveraging organizational memory to make data-driven predictions without building complex ML models.
Also Known As
- Case-Based Reasoning
- Similarity-Based Prediction
- Analog Forecasting
- K-Nearest Neighbors (KNN) for Outcomes
- Pattern-Based Prediction
Problem
We have rich historical data but don't know how to use it predictively.
Martinez family shows these behaviors: - Engagement score dropped from 76 → 65 over 3 months - Email open rate: 26% - Last portal login: 47 days ago - Last payment: 12 days late - Event attendance: 2 of 8 events
Sarah thinks: "This feels like the situation with the Johnson family last year... and they withdrew."
The system should think the same thing: - Find families with similar behavioral patterns in history - See what happened to them (stayed or withdrew?) - Predict Martinez outcome based on historical patterns - Provide confidence based on pattern strength
Without historical pattern matching: - Rely on human memory ("Feels like Johnson family") - Inconsistent predictions (different coordinators remember different cases) - Can't quantify confidence - Limited by what one person remembers
With pattern matching: - System searches all historical cases automatically - Finds 8 families with similar patterns - 7 of 8 withdrew (87.5% likelihood) - Confident prediction: "High withdrawal risk based on 8 similar historical cases"
Context
When this pattern applies:
- Sufficient historical data (50+ completed cases minimum)
- Outcomes are known for historical cases
- Current situations resemble past situations
- Behavior patterns predict outcomes
- Don't need complex ML (simpler approach works)
When this pattern may not be needed:
- No historical data yet (new organization)
- Every situation is truly unique (no patterns)
- Outcomes too complex for similarity matching
- Need more sophisticated prediction (use Pattern 12 instead)
Forces
Competing concerns:
1. Simplicity vs Accuracy - Pattern matching is simple, interpretable - But ML models may be more accurate - Balance: Start with pattern matching, graduate to ML if needed
2. Historical Similarity vs Current Uniqueness - Assume past predicts future - But circumstances change, contexts differ - Balance: Weight recent history higher, require minimum similarity threshold
3. Many Features vs Few Features - More behavioral features = better matching - But curse of dimensionality (hard to find exact matches) - Balance: 5-10 key features, normalize importance
4. Exact Matches vs Near Matches - Exact matches are rare - Near matches are plentiful but less reliable - Balance: Define similarity threshold (e.g., 80%+ similar)
5. Interpretability vs Black Box - Pattern matching is transparent ("Similar to these 8 cases") - But complex distance metrics less interpretable - Balance: Simple metrics, show similar cases to user
Solution
Build a pattern matching system that:
- Characterizes current situation as feature vector
- Searches historical cases for similar patterns
- Ranks by similarity using distance metrics
- Predicts outcome based on what happened to similar cases
- Provides confidence based on consistency and sample size
Feature Vector Example:
{
engagement_score: 65,
engagement_velocity: -3.7, // points per month
email_open_rate: 26,
days_since_login: 47,
payment_on_time_rate: 75,
event_attendance_rate: 25,
tenure_days: 245,
communication_responsiveness: 30
}
Similarity Calculation: - Euclidean distance, cosine similarity, or weighted Manhattan distance - Normalize features to 0-1 scale - Weight important features higher - Find K nearest neighbors (typically K=5-20)
Outcome Prediction: - Majority vote: What happened to most similar cases? - Weighted vote: Weight closer matches higher - Probability estimate: % that had outcome X
Structure
Historical Patterns Tables
-- Store historical cases (snapshots at decision points)
CREATE TABLE historical_patterns (
pattern_id INT PRIMARY KEY IDENTITY(1,1),
family_id INT NOT NULL,
-- When was this snapshot taken?
snapshot_date DATETIME2 NOT NULL,
snapshot_reason VARCHAR(100), -- 'pre_withdrawal', 'semester_end', 'intervention_point'
-- Feature vector (behavioral characteristics at this moment)
engagement_score DECIMAL(5,2),
engagement_velocity DECIMAL(5,2),
email_open_rate DECIMAL(5,2),
days_since_login INT,
payment_on_time_rate DECIMAL(5,2),
event_attendance_rate DECIMAL(5,2),
tenure_days INT,
communication_responsiveness DECIMAL(5,2),
-- Additional features (domain-specific)
volunteer_hours DECIMAL(5,1),
referrals_made INT,
support_tickets_count INT,
-- Feature vector as JSON for flexibility
feature_vector NVARCHAR(MAX), -- JSON
-- What happened after this snapshot?
outcome VARCHAR(50), -- 'withdrew', 'remained', 'improved', 'declined'
outcome_date DATETIME2,
days_to_outcome INT,
-- Context
intervention_attempted BIT DEFAULT 0,
intervention_type VARCHAR(100),
intervention_successful BIT,
CONSTRAINT FK_pattern_family FOREIGN KEY (family_id)
REFERENCES families(family_id)
);
-- Indexes for pattern matching
CREATE INDEX IX_outcome ON historical_patterns(outcome);
CREATE INDEX IX_snapshot_date ON historical_patterns(snapshot_date);
CREATE INDEX IX_engagement_score ON historical_patterns(engagement_score);
-- Store predictions made using pattern matching
CREATE TABLE pattern_predictions (
prediction_id INT PRIMARY KEY IDENTITY(1,1),
family_id INT NOT NULL,
-- Prediction details
prediction_date DATETIME2 DEFAULT GETDATE(),
predicted_outcome VARCHAR(50),
confidence DECIMAL(5,2), -- 0-100
-- Similar cases used
similar_cases_count INT,
similar_cases_json NVARCHAR(MAX), -- JSON array of pattern_ids
avg_similarity DECIMAL(5,2), -- Average similarity score
-- What actually happened (filled in later)
actual_outcome VARCHAR(50),
actual_outcome_date DATETIME2,
prediction_correct BIT,
CONSTRAINT FK_prediction_family FOREIGN KEY (family_id)
REFERENCES families(family_id)
);
Implementation
Pattern Matching Engine
class PatternMatcher {
constructor(db, config = {}) {
this.db = db;
this.k = config.k || 10; // Number of nearest neighbors
this.minSimilarity = config.minSimilarity || 0.70; // 70% similar minimum
// Feature weights (sum to 1.0)
this.weights = config.weights || {
engagement_score: 0.20,
engagement_velocity: 0.20,
email_open_rate: 0.10,
days_since_login: 0.10,
payment_on_time_rate: 0.20,
event_attendance_rate: 0.10,
tenure_days: 0.05,
communication_responsiveness: 0.05
};
}
async predictOutcome(familyId, outcomeType = 'withdrawal') {
// Step 1: Get current feature vector
const currentFeatures = await this.extractFeatures(familyId);
// Step 2: Find similar historical patterns
const similarCases = await this.findSimilarCases(currentFeatures, outcomeType);
if (similarCases.length === 0) {
return {
predicted_outcome: 'unknown',
confidence: 0,
reason: 'No similar historical cases found',
similar_cases: []
};
}
// Step 3: Predict based on similar cases
const prediction = this.makePrediction(similarCases);
// Step 4: Save prediction for later validation
await this.savePrediction(familyId, prediction, similarCases);
return prediction;
}
async extractFeatures(familyId) {
// Get current metrics
const metrics = await this.db.query(`
SELECT
fem.engagement_score,
fem.engagement_tier,
ra.withdrawal_risk,
ra.payment_risk
FROM family_engagement_metrics fem
LEFT JOIN risk_assessments ra ON fem.family_id = ra.family_id
WHERE fem.family_id = ?
`, [familyId]);
// Get velocity
const velocity = await this.db.query(`
SELECT score_delta / calculation_period_days * 30 as monthly_velocity
FROM family_engagement_metrics
WHERE family_id = ?
`, [familyId]);
// Get communication metrics
const comm = await this.db.query(`
SELECT
SUM(CASE WHEN interaction_type = 'email_sent' THEN 1 ELSE 0 END) as emails_sent,
SUM(CASE WHEN interaction_type = 'email_opened' THEN 1 ELSE 0 END) as emails_opened
FROM interaction_log
WHERE family_id = ?
AND interaction_timestamp >= DATE_SUB(NOW(), INTERVAL 90 DAY)
`, [familyId]);
// Get portal activity
const portal = await this.db.query(`
SELECT
MAX(interaction_timestamp) as last_login,
DATEDIFF(NOW(), MAX(interaction_timestamp)) as days_since_login
FROM interaction_log
WHERE family_id = ?
AND interaction_type = 'portal_login'
`, [familyId]);
// Get payment history
const payment = await this.db.query(`
SELECT
COUNT(*) as total_payments,
SUM(CASE WHEN outcome = 'paid_on_time' THEN 1 ELSE 0 END) as on_time_payments
FROM interaction_log
WHERE family_id = ?
AND interaction_type = 'payment_received'
AND interaction_timestamp >= DATE_SUB(NOW(), INTERVAL 1 YEAR)
`, [familyId]);
// Get event participation
const events = await this.db.query(`
SELECT
COUNT(CASE WHEN interaction_type = 'event_attended' THEN 1 END) as attended,
COUNT(CASE WHEN interaction_type = 'event_invited' THEN 1 END) as invited
FROM interaction_log
WHERE family_id = ?
AND interaction_timestamp >= DATE_SUB(NOW(), INTERVAL 90 DAY)
`, [familyId]);
// Get tenure
const tenure = await this.db.query(`
SELECT DATEDIFF(NOW(), enrollment_date) as days_enrolled
FROM families WHERE family_id = ?
`, [familyId]);
// Construct feature vector
const m = metrics[0] || {};
const c = comm[0] || {};
const p = portal[0] || {};
const pay = payment[0] || {};
const e = events[0] || {};
return {
engagement_score: m.engagement_score || 50,
engagement_velocity: velocity[0]?.monthly_velocity || 0,
email_open_rate: c.emails_sent > 0 ? (c.emails_opened / c.emails_sent * 100) : 50,
days_since_login: p.days_since_login || 999,
payment_on_time_rate: pay.total_payments > 0 ? (pay.on_time_payments / pay.total_payments * 100) : 50,
event_attendance_rate: e.invited > 0 ? (e.attended / e.invited * 100) : 50,
tenure_days: tenure[0]?.days_enrolled || 0,
communication_responsiveness: m.engagement_score || 50 // Proxy
};
}
async findSimilarCases(currentFeatures, outcomeType) {
// Get all historical patterns
const historicalCases = await this.db.query(`
SELECT
pattern_id,
family_id,
engagement_score,
engagement_velocity,
email_open_rate,
days_since_login,
payment_on_time_rate,
event_attendance_rate,
tenure_days,
communication_responsiveness,
outcome,
outcome_date,
days_to_outcome,
intervention_attempted,
intervention_type,
intervention_successful
FROM historical_patterns
WHERE outcome IS NOT NULL
AND snapshot_date >= DATE_SUB(NOW(), INTERVAL 3 YEAR)
ORDER BY snapshot_date DESC
`);
// Calculate similarity for each case
const withSimilarity = historicalCases.map(hcase => {
const similarity = this.calculateSimilarity(currentFeatures, hcase);
return {
...hcase,
similarity: similarity
};
});
// Filter by minimum similarity and sort
const similar = withSimilarity
.filter(c => c.similarity >= this.minSimilarity)
.sort((a, b) => b.similarity - a.similarity)
.slice(0, this.k);
return similar;
}
calculateSimilarity(current, historical) {
// Weighted Euclidean distance, normalized to 0-1 similarity
const features = Object.keys(this.weights);
let weightedDistance = 0;
let totalWeight = 0;
for (const feature of features) {
const weight = this.weights[feature];
// Normalize feature values to 0-1 scale
const currentNorm = this.normalizeFeature(feature, current[feature]);
const historicalNorm = this.normalizeFeature(feature, historical[feature]);
// Calculate squared difference
const diff = currentNorm - historicalNorm;
weightedDistance += weight * (diff * diff);
totalWeight += weight;
}
// Convert distance to similarity (0 = identical, higher = different)
const distance = Math.sqrt(weightedDistance / totalWeight);
const similarity = Math.max(0, 1 - distance);
return similarity;
}
normalizeFeature(feature, value) {
// Normalize each feature to 0-1 scale
const ranges = {
engagement_score: [0, 100],
engagement_velocity: [-20, 20],
email_open_rate: [0, 100],
days_since_login: [0, 180], // Cap at 180 days
payment_on_time_rate: [0, 100],
event_attendance_rate: [0, 100],
tenure_days: [0, 730], // Cap at 2 years
communication_responsiveness: [0, 100]
};
const [min, max] = ranges[feature] || [0, 100];
const clamped = Math.max(min, Math.min(max, value));
return (clamped - min) / (max - min);
}
makePrediction(similarCases) {
if (similarCases.length === 0) {
return {
predicted_outcome: 'unknown',
confidence: 0,
similar_cases: []
};
}
// Count outcomes, weighted by similarity
const outcomeScores = {};
let totalWeight = 0;
similarCases.forEach(c => {
const weight = c.similarity;
outcomeScores[c.outcome] = (outcomeScores[c.outcome] || 0) + weight;
totalWeight += weight;
});
// Find most likely outcome
let predictedOutcome = null;
let maxScore = 0;
for (const [outcome, score] of Object.entries(outcomeScores)) {
if (score > maxScore) {
maxScore = score;
predictedOutcome = outcome;
}
}
// Calculate confidence
const confidence = (maxScore / totalWeight) * 100;
// Average similarity
const avgSimilarity = similarCases.reduce((sum, c) => sum + c.similarity, 0) / similarCases.length;
// Calculate days to outcome (average)
const avgDaysToOutcome = Math.round(
similarCases.reduce((sum, c) => sum + (c.days_to_outcome || 0), 0) / similarCases.length
);
return {
predicted_outcome: predictedOutcome,
confidence: Math.round(confidence),
similar_cases_count: similarCases.length,
avg_similarity: Math.round(avgSimilarity * 100),
avg_days_to_outcome: avgDaysToOutcome,
similar_cases: similarCases.slice(0, 5).map(c => ({
pattern_id: c.pattern_id,
family_id: c.family_id,
similarity: Math.round(c.similarity * 100),
outcome: c.outcome,
days_to_outcome: c.days_to_outcome
}))
};
}
async savePrediction(familyId, prediction, similarCases) {
const similarCasesJson = JSON.stringify(
similarCases.map(c => c.pattern_id)
);
await this.db.query(`
INSERT INTO pattern_predictions (
family_id,
predicted_outcome,
confidence,
similar_cases_count,
similar_cases_json,
avg_similarity
) VALUES (?, ?, ?, ?, ?, ?)
`, [
familyId,
prediction.predicted_outcome,
prediction.confidence,
prediction.similar_cases_count,
similarCasesJson,
prediction.avg_similarity
]);
}
// Create historical pattern from current state
async capturePattern(familyId, reason) {
const features = await this.extractFeatures(familyId);
await this.db.query(`
INSERT INTO historical_patterns (
family_id,
snapshot_date,
snapshot_reason,
engagement_score,
engagement_velocity,
email_open_rate,
days_since_login,
payment_on_time_rate,
event_attendance_rate,
tenure_days,
communication_responsiveness,
feature_vector
) VALUES (?, NOW(), ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
`, [
familyId,
reason,
features.engagement_score,
features.engagement_velocity,
features.email_open_rate,
features.days_since_login,
features.payment_on_time_rate,
features.event_attendance_rate,
features.tenure_days,
features.communication_responsiveness,
JSON.stringify(features)
]);
}
// Update historical pattern with outcome
async recordOutcome(familyId, outcome, outcomeDate) {
await this.db.query(`
UPDATE historical_patterns
SET
outcome = ?,
outcome_date = ?,
days_to_outcome = DATEDIFF(?, snapshot_date)
WHERE family_id = ?
AND outcome IS NULL
`, [outcome, outcomeDate, outcomeDate, familyId]);
}
}
module.exports = PatternMatcher;
Usage Example
const matcher = new PatternMatcher(db, {
k: 10,
minSimilarity: 0.70,
weights: {
engagement_score: 0.20,
engagement_velocity: 0.20,
payment_on_time_rate: 0.20,
email_open_rate: 0.10,
days_since_login: 0.10,
event_attendance_rate: 0.10,
tenure_days: 0.05,
communication_responsiveness: 0.05
}
});
// Predict withdrawal for Martinez family
const prediction = await matcher.predictOutcome(187, 'withdrawal');
console.log(`
Withdrawal Prediction for Family 187:
Predicted Outcome: ${prediction.predicted_outcome}
Confidence: ${prediction.confidence}%
Based on ${prediction.similar_cases_count} similar historical cases
Average similarity: ${prediction.avg_similarity}%
Expected timeframe: ${prediction.avg_days_to_outcome} days
Top 5 Similar Cases:
`);
prediction.similar_cases.forEach((c, i) => {
console.log(` ${i+1}. Family ${c.family_id} (${c.similarity}% similar)`);
console.log(` Outcome: ${c.outcome} after ${c.days_to_outcome} days`);
});
// Example output:
// Withdrawal Prediction for Family 187:
// Predicted Outcome: withdrew
// Confidence: 87%
//
// Based on 8 similar historical cases
// Average similarity: 85%
// Expected timeframe: 45 days
//
// Top 5 Similar Cases:
// 1. Family 142 (92% similar)
// Outcome: withdrew after 38 days
// 2. Family 89 (88% similar)
// Outcome: withdrew after 52 days
// 3. Family 201 (86% similar)
// Outcome: withdrew after 41 days
// 4. Family 156 (84% similar)
// Outcome: remained after 60 days
// 5. Family 78 (82% similar)
// Outcome: withdrew after 49 days
Variations
By Matching Algorithm
Euclidean Distance (shown above): - Simple, interpretable - Sensitive to scale (requires normalization) - Good for continuous features
Cosine Similarity:
calculateCosineSimilarity(current, historical) {
let dotProduct = 0;
let normA = 0;
let normB = 0;
for (const feature of Object.keys(this.weights)) {
const a = this.normalizeFeature(feature, current[feature]);
const b = this.normalizeFeature(feature, historical[feature]);
dotProduct += a * b;
normA += a * a;
normB += b * b;
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
Manhattan Distance:
calculateManhattanDistance(current, historical) {
let distance = 0;
for (const feature of Object.keys(this.weights)) {
const weight = this.weights[feature];
const a = this.normalizeFeature(feature, current[feature]);
const b = this.normalizeFeature(feature, historical[feature]);
distance += weight * Math.abs(a - b);
}
return 1 - distance; // Convert to similarity
}
By Prediction Type
Binary Classification (yes/no): - Will withdraw or remain? - Will pay or default? - Use majority vote
Multi-Class Classification: - Will improve, remain stable, or decline? - Low/medium/high risk tier? - Use weighted voting
Regression (continuous value): - Predicted engagement score in 30 days? - Average similar cases' outcomes
Time-to-Event: - How many days until withdrawal? - Average days from similar cases
By Domain
Homeschool Co-op:
weights: {
engagement_score: 0.20,
engagement_velocity: 0.20,
payment_on_time_rate: 0.20,
event_attendance_rate: 0.15,
volunteer_hours: 0.10,
communication_responsiveness: 0.10,
tenure_days: 0.05
}
SaaS Product:
weights: {
feature_usage_breadth: 0.25,
login_frequency: 0.20,
support_ticket_frequency: 0.15,
user_growth_rate: 0.15,
payment_health: 0.15,
tenure_days: 0.10
}
Property Management:
weights: {
rent_payment_reliability: 0.30,
maintenance_request_frequency: 0.20,
lease_compliance: 0.20,
neighbor_complaints: 0.15,
communication_responsiveness: 0.10,
tenure_months: 0.05
}
Consequences
Benefits
1. Leverage organizational memory "This situation is 87% similar to 8 past cases, 7 of which withdrew" - system learns from history.
2. No ML expertise required Simple distance metrics, interpretable results, no training needed.
3. Transparent predictions "Similar to these 5 families" - can show user why prediction was made.
4. Works with small data Can work with 50-100 historical cases (ML needs thousands).
5. Automatic improvement As more cases occur, predictions improve automatically.
6. Multi-outcome capable Can predict withdrawal, payment issues, engagement changes from same system.
7. Confidence scores Know when predictions are reliable vs uncertain.
Costs
1. Requires labeled historical data Need to know outcomes for past cases. New organizations don't have this.
2. Curse of dimensionality Too many features make finding similar cases harder.
3. Assumes past predicts future Works poorly when circumstances change (COVID, policy changes).
4. Sensitive to feature selection Wrong features or weights reduce accuracy.
5. Computational cost Need to compare against all historical cases for each prediction.
6. Cold start problem New features can't use historical data that doesn't include them.
7. Doesn't capture interactions Linear similarity doesn't capture complex feature interactions.
Sample Code
Batch prediction for all at-risk families:
async function batchPredictWithdrawals() {
// Get all at-risk families
const atRisk = await db.query(`
SELECT family_id, family_name, engagement_score
FROM families f
JOIN family_engagement_metrics fem ON f.family_id = fem.family_id
WHERE f.enrolled_current_semester = 1
AND (fem.engagement_score < 60 OR fem.score_velocity = 'declining')
`);
const matcher = new PatternMatcher(db);
const predictions = [];
for (const family of atRisk) {
const prediction = await matcher.predictOutcome(family.family_id, 'withdrawal');
predictions.push({
family_id: family.family_id,
family_name: family.family_name,
current_score: family.engagement_score,
...prediction
});
}
// Sort by confidence (most confident predictions first)
predictions.sort((a, b) => b.confidence - a.confidence);
return predictions;
}
Validate prediction accuracy:
async function validatePredictionAccuracy() {
// Get predictions made 60+ days ago
const oldPredictions = await db.query(`
SELECT
pp.prediction_id,
pp.family_id,
pp.predicted_outcome,
pp.confidence,
f.enrollment_status
FROM pattern_predictions pp
JOIN families f ON pp.family_id = f.family_id
WHERE pp.prediction_date < DATE_SUB(NOW(), INTERVAL 60 DAY)
AND pp.actual_outcome IS NULL
`);
let correct = 0;
let total = 0;
for (const pred of oldPredictions) {
const actualOutcome = pred.enrollment_status === 'active' ? 'remained' : 'withdrew';
// Update record with actual outcome
await db.query(`
UPDATE pattern_predictions
SET
actual_outcome = ?,
actual_outcome_date = NOW(),
prediction_correct = ?
WHERE prediction_id = ?
`, [
actualOutcome,
pred.predicted_outcome === actualOutcome ? 1 : 0,
pred.prediction_id
]);
if (pred.predicted_outcome === actualOutcome) {
correct++;
}
total++;
}
const accuracy = (correct / total) * 100;
console.log(`Prediction Accuracy: ${accuracy.toFixed(1)}% (${correct}/${total})`);
return { accuracy, correct, total };
}
Known Uses
Homeschool Co-op Intelligence Platform - 82% accuracy predicting withdrawals (65 historical cases) - Predictions made 30-45 days in advance - Average confidence: 78% - Enabled proactive interventions
Medical Diagnosis Systems - Case-based reasoning widely used in clinical decision support - Find similar patient cases, see what treatments worked - Explains recommendations by showing similar cases
Weather Forecasting - Analog forecasting: find similar historical weather patterns - Predict based on what happened in similar situations - Simple but effective approach
Financial Credit Scoring - Early credit scoring used "similar borrower" analysis - Find borrowers with similar profile, check default rates - Still used alongside more complex models
Related Patterns
Requires: - Pattern 1: Universal Event Log - provides historical interaction data - Pattern 6: Composite Health Scoring - features for matching - Pattern 7: Multi-Dimensional Risk Assessment - additional features
Complements: - Pattern 12: Risk Stratification Models - ML alternative/complement - Pattern 13: Confidence Scoring - enhances prediction confidence - Pattern 14: Predictive Time Windows - when to make predictions
Enables: - Pattern 15: Intervention Recommendation Engine - predictions drive recommendations - Pattern 22: Progressive Escalation Sequences - predictive triggers - Pattern 23: Triggered Interventions - automated responses to predictions
References
On Case-Based Reasoning: - Aamodt, Agnar, and Enric Plaza. "Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches." AI Communications 7(1), 1994: 39-59. (Definitive CBR survey) - Kolodner, Janet. Case-Based Reasoning. Morgan Kaufmann, 1993. (Comprehensive CBR textbook)
On Nearest Neighbor Algorithms: - Cover, Thomas, and Peter Hart. "Nearest Neighbor Pattern Classification." IEEE Transactions on Information Theory 13(1), 1967: 21-27. (Original k-NN algorithm) - Altman, Naomi S. "An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression." The American Statistician 46(3), 1992: 175-185. - Mitchell, Tom M. Machine Learning. McGraw-Hill, 1997. (Chapter 8: Instance-Based Learning)
On Similarity Measures: - Cha, Sung-Hyuk. "Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions." International Journal of Mathematical Models and Methods in Applied Sciences 1(4), 2007. - "Cosine Similarity." Wikipedia. https://en.wikipedia.org/wiki/Cosine_similarity (For text/vector similarity)
On Implementation: - Scikit-learn Neighbors: https://scikit-learn.org/stable/modules/neighbors.html (k-NN implementation) - FAISS (Facebook AI Similarity Search): https://github.com/facebookresearch/faiss (Fast similarity search at scale) - Annoy (Spotify): https://github.com/spotify/annoy (Approximate nearest neighbors)
Related Patterns in This Trilogy: - Pattern 1 (Universal Event Log): Historical data source for matching - Pattern 13 (Confidence Scoring): Using similar cases to predict defaults - Pattern 18 (Cohort Analysis): Finding groups of similar cases - Volume 1: Domain patterns provide feature space for similarity - Volume 3, Pattern 8 (Intelligent Defaults): Pre-filling based on similar cases