Volume 2: Organizational Intelligence Platforms

Pattern 11: Historical Pattern Matching

Intent

Identify similar historical behavior patterns and use their outcomes to predict what will happen with current situations, leveraging organizational memory to make data-driven predictions without building complex ML models.

Also Known As

  • Case-Based Reasoning
  • Similarity-Based Prediction
  • Analog Forecasting
  • K-Nearest Neighbors (KNN) for Outcomes
  • Pattern-Based Prediction

Problem

We have rich historical data but don't know how to use it predictively.

Martinez family shows these behaviors: - Engagement score dropped from 76 → 65 over 3 months - Email open rate: 26% - Last portal login: 47 days ago - Last payment: 12 days late - Event attendance: 2 of 8 events

Sarah thinks: "This feels like the situation with the Johnson family last year... and they withdrew."

The system should think the same thing: - Find families with similar behavioral patterns in history - See what happened to them (stayed or withdrew?) - Predict Martinez outcome based on historical patterns - Provide confidence based on pattern strength

Without historical pattern matching: - Rely on human memory ("Feels like Johnson family") - Inconsistent predictions (different coordinators remember different cases) - Can't quantify confidence - Limited by what one person remembers

With pattern matching: - System searches all historical cases automatically - Finds 8 families with similar patterns - 7 of 8 withdrew (87.5% likelihood) - Confident prediction: "High withdrawal risk based on 8 similar historical cases"

Context

When this pattern applies:

  • Sufficient historical data (50+ completed cases minimum)
  • Outcomes are known for historical cases
  • Current situations resemble past situations
  • Behavior patterns predict outcomes
  • Don't need complex ML (simpler approach works)

When this pattern may not be needed:

  • No historical data yet (new organization)
  • Every situation is truly unique (no patterns)
  • Outcomes too complex for similarity matching
  • Need more sophisticated prediction (use Pattern 12 instead)

Forces

Competing concerns:

1. Simplicity vs Accuracy - Pattern matching is simple, interpretable - But ML models may be more accurate - Balance: Start with pattern matching, graduate to ML if needed

2. Historical Similarity vs Current Uniqueness - Assume past predicts future - But circumstances change, contexts differ - Balance: Weight recent history higher, require minimum similarity threshold

3. Many Features vs Few Features - More behavioral features = better matching - But curse of dimensionality (hard to find exact matches) - Balance: 5-10 key features, normalize importance

4. Exact Matches vs Near Matches - Exact matches are rare - Near matches are plentiful but less reliable - Balance: Define similarity threshold (e.g., 80%+ similar)

5. Interpretability vs Black Box - Pattern matching is transparent ("Similar to these 8 cases") - But complex distance metrics less interpretable - Balance: Simple metrics, show similar cases to user

Solution

Build a pattern matching system that:

  1. Characterizes current situation as feature vector
  2. Searches historical cases for similar patterns
  3. Ranks by similarity using distance metrics
  4. Predicts outcome based on what happened to similar cases
  5. Provides confidence based on consistency and sample size

Feature Vector Example:

{
  engagement_score: 65,
  engagement_velocity: -3.7,  // points per month
  email_open_rate: 26,
  days_since_login: 47,
  payment_on_time_rate: 75,
  event_attendance_rate: 25,
  tenure_days: 245,
  communication_responsiveness: 30
}

Similarity Calculation: - Euclidean distance, cosine similarity, or weighted Manhattan distance - Normalize features to 0-1 scale - Weight important features higher - Find K nearest neighbors (typically K=5-20)

Outcome Prediction: - Majority vote: What happened to most similar cases? - Weighted vote: Weight closer matches higher - Probability estimate: % that had outcome X

Structure

Historical Patterns Tables

-- Store historical cases (snapshots at decision points)
CREATE TABLE historical_patterns (
  pattern_id INT PRIMARY KEY IDENTITY(1,1),
  family_id INT NOT NULL,

  -- When was this snapshot taken?
  snapshot_date DATETIME2 NOT NULL,
  snapshot_reason VARCHAR(100),  -- 'pre_withdrawal', 'semester_end', 'intervention_point'

  -- Feature vector (behavioral characteristics at this moment)
  engagement_score DECIMAL(5,2),
  engagement_velocity DECIMAL(5,2),
  email_open_rate DECIMAL(5,2),
  days_since_login INT,
  payment_on_time_rate DECIMAL(5,2),
  event_attendance_rate DECIMAL(5,2),
  tenure_days INT,
  communication_responsiveness DECIMAL(5,2),

  -- Additional features (domain-specific)
  volunteer_hours DECIMAL(5,1),
  referrals_made INT,
  support_tickets_count INT,

  -- Feature vector as JSON for flexibility
  feature_vector NVARCHAR(MAX),  -- JSON

  -- What happened after this snapshot?
  outcome VARCHAR(50),  -- 'withdrew', 'remained', 'improved', 'declined'
  outcome_date DATETIME2,
  days_to_outcome INT,

  -- Context
  intervention_attempted BIT DEFAULT 0,
  intervention_type VARCHAR(100),
  intervention_successful BIT,

  CONSTRAINT FK_pattern_family FOREIGN KEY (family_id) 
    REFERENCES families(family_id)
);

-- Indexes for pattern matching
CREATE INDEX IX_outcome ON historical_patterns(outcome);
CREATE INDEX IX_snapshot_date ON historical_patterns(snapshot_date);
CREATE INDEX IX_engagement_score ON historical_patterns(engagement_score);

-- Store predictions made using pattern matching
CREATE TABLE pattern_predictions (
  prediction_id INT PRIMARY KEY IDENTITY(1,1),
  family_id INT NOT NULL,

  -- Prediction details
  prediction_date DATETIME2 DEFAULT GETDATE(),
  predicted_outcome VARCHAR(50),
  confidence DECIMAL(5,2),  -- 0-100

  -- Similar cases used
  similar_cases_count INT,
  similar_cases_json NVARCHAR(MAX),  -- JSON array of pattern_ids
  avg_similarity DECIMAL(5,2),  -- Average similarity score

  -- What actually happened (filled in later)
  actual_outcome VARCHAR(50),
  actual_outcome_date DATETIME2,
  prediction_correct BIT,

  CONSTRAINT FK_prediction_family FOREIGN KEY (family_id) 
    REFERENCES families(family_id)
);

Implementation

Pattern Matching Engine

class PatternMatcher {
  constructor(db, config = {}) {
    this.db = db;
    this.k = config.k || 10;  // Number of nearest neighbors
    this.minSimilarity = config.minSimilarity || 0.70;  // 70% similar minimum

    // Feature weights (sum to 1.0)
    this.weights = config.weights || {
      engagement_score: 0.20,
      engagement_velocity: 0.20,
      email_open_rate: 0.10,
      days_since_login: 0.10,
      payment_on_time_rate: 0.20,
      event_attendance_rate: 0.10,
      tenure_days: 0.05,
      communication_responsiveness: 0.05
    };
  }

  async predictOutcome(familyId, outcomeType = 'withdrawal') {
    // Step 1: Get current feature vector
    const currentFeatures = await this.extractFeatures(familyId);

    // Step 2: Find similar historical patterns
    const similarCases = await this.findSimilarCases(currentFeatures, outcomeType);

    if (similarCases.length === 0) {
      return {
        predicted_outcome: 'unknown',
        confidence: 0,
        reason: 'No similar historical cases found',
        similar_cases: []
      };
    }

    // Step 3: Predict based on similar cases
    const prediction = this.makePrediction(similarCases);

    // Step 4: Save prediction for later validation
    await this.savePrediction(familyId, prediction, similarCases);

    return prediction;
  }

  async extractFeatures(familyId) {
    // Get current metrics
    const metrics = await this.db.query(`
      SELECT 
        fem.engagement_score,
        fem.engagement_tier,
        ra.withdrawal_risk,
        ra.payment_risk
      FROM family_engagement_metrics fem
      LEFT JOIN risk_assessments ra ON fem.family_id = ra.family_id
      WHERE fem.family_id = ?
    `, [familyId]);

    // Get velocity
    const velocity = await this.db.query(`
      SELECT score_delta / calculation_period_days * 30 as monthly_velocity
      FROM family_engagement_metrics
      WHERE family_id = ?
    `, [familyId]);

    // Get communication metrics
    const comm = await this.db.query(`
      SELECT 
        SUM(CASE WHEN interaction_type = 'email_sent' THEN 1 ELSE 0 END) as emails_sent,
        SUM(CASE WHEN interaction_type = 'email_opened' THEN 1 ELSE 0 END) as emails_opened
      FROM interaction_log
      WHERE family_id = ?
        AND interaction_timestamp >= DATE_SUB(NOW(), INTERVAL 90 DAY)
    `, [familyId]);

    // Get portal activity
    const portal = await this.db.query(`
      SELECT 
        MAX(interaction_timestamp) as last_login,
        DATEDIFF(NOW(), MAX(interaction_timestamp)) as days_since_login
      FROM interaction_log
      WHERE family_id = ?
        AND interaction_type = 'portal_login'
    `, [familyId]);

    // Get payment history
    const payment = await this.db.query(`
      SELECT 
        COUNT(*) as total_payments,
        SUM(CASE WHEN outcome = 'paid_on_time' THEN 1 ELSE 0 END) as on_time_payments
      FROM interaction_log
      WHERE family_id = ?
        AND interaction_type = 'payment_received'
        AND interaction_timestamp >= DATE_SUB(NOW(), INTERVAL 1 YEAR)
    `, [familyId]);

    // Get event participation
    const events = await this.db.query(`
      SELECT 
        COUNT(CASE WHEN interaction_type = 'event_attended' THEN 1 END) as attended,
        COUNT(CASE WHEN interaction_type = 'event_invited' THEN 1 END) as invited
      FROM interaction_log
      WHERE family_id = ?
        AND interaction_timestamp >= DATE_SUB(NOW(), INTERVAL 90 DAY)
    `, [familyId]);

    // Get tenure
    const tenure = await this.db.query(`
      SELECT DATEDIFF(NOW(), enrollment_date) as days_enrolled
      FROM families WHERE family_id = ?
    `, [familyId]);

    // Construct feature vector
    const m = metrics[0] || {};
    const c = comm[0] || {};
    const p = portal[0] || {};
    const pay = payment[0] || {};
    const e = events[0] || {};

    return {
      engagement_score: m.engagement_score || 50,
      engagement_velocity: velocity[0]?.monthly_velocity || 0,
      email_open_rate: c.emails_sent > 0 ? (c.emails_opened / c.emails_sent * 100) : 50,
      days_since_login: p.days_since_login || 999,
      payment_on_time_rate: pay.total_payments > 0 ? (pay.on_time_payments / pay.total_payments * 100) : 50,
      event_attendance_rate: e.invited > 0 ? (e.attended / e.invited * 100) : 50,
      tenure_days: tenure[0]?.days_enrolled || 0,
      communication_responsiveness: m.engagement_score || 50  // Proxy
    };
  }

  async findSimilarCases(currentFeatures, outcomeType) {
    // Get all historical patterns
    const historicalCases = await this.db.query(`
      SELECT 
        pattern_id,
        family_id,
        engagement_score,
        engagement_velocity,
        email_open_rate,
        days_since_login,
        payment_on_time_rate,
        event_attendance_rate,
        tenure_days,
        communication_responsiveness,
        outcome,
        outcome_date,
        days_to_outcome,
        intervention_attempted,
        intervention_type,
        intervention_successful
      FROM historical_patterns
      WHERE outcome IS NOT NULL
        AND snapshot_date >= DATE_SUB(NOW(), INTERVAL 3 YEAR)
      ORDER BY snapshot_date DESC
    `);

    // Calculate similarity for each case
    const withSimilarity = historicalCases.map(hcase => {
      const similarity = this.calculateSimilarity(currentFeatures, hcase);
      return {
        ...hcase,
        similarity: similarity
      };
    });

    // Filter by minimum similarity and sort
    const similar = withSimilarity
      .filter(c => c.similarity >= this.minSimilarity)
      .sort((a, b) => b.similarity - a.similarity)
      .slice(0, this.k);

    return similar;
  }

  calculateSimilarity(current, historical) {
    // Weighted Euclidean distance, normalized to 0-1 similarity
    const features = Object.keys(this.weights);

    let weightedDistance = 0;
    let totalWeight = 0;

    for (const feature of features) {
      const weight = this.weights[feature];

      // Normalize feature values to 0-1 scale
      const currentNorm = this.normalizeFeature(feature, current[feature]);
      const historicalNorm = this.normalizeFeature(feature, historical[feature]);

      // Calculate squared difference
      const diff = currentNorm - historicalNorm;
      weightedDistance += weight * (diff * diff);
      totalWeight += weight;
    }

    // Convert distance to similarity (0 = identical, higher = different)
    const distance = Math.sqrt(weightedDistance / totalWeight);
    const similarity = Math.max(0, 1 - distance);

    return similarity;
  }

  normalizeFeature(feature, value) {
    // Normalize each feature to 0-1 scale
    const ranges = {
      engagement_score: [0, 100],
      engagement_velocity: [-20, 20],
      email_open_rate: [0, 100],
      days_since_login: [0, 180],  // Cap at 180 days
      payment_on_time_rate: [0, 100],
      event_attendance_rate: [0, 100],
      tenure_days: [0, 730],  // Cap at 2 years
      communication_responsiveness: [0, 100]
    };

    const [min, max] = ranges[feature] || [0, 100];
    const clamped = Math.max(min, Math.min(max, value));
    return (clamped - min) / (max - min);
  }

  makePrediction(similarCases) {
    if (similarCases.length === 0) {
      return {
        predicted_outcome: 'unknown',
        confidence: 0,
        similar_cases: []
      };
    }

    // Count outcomes, weighted by similarity
    const outcomeScores = {};
    let totalWeight = 0;

    similarCases.forEach(c => {
      const weight = c.similarity;
      outcomeScores[c.outcome] = (outcomeScores[c.outcome] || 0) + weight;
      totalWeight += weight;
    });

    // Find most likely outcome
    let predictedOutcome = null;
    let maxScore = 0;

    for (const [outcome, score] of Object.entries(outcomeScores)) {
      if (score > maxScore) {
        maxScore = score;
        predictedOutcome = outcome;
      }
    }

    // Calculate confidence
    const confidence = (maxScore / totalWeight) * 100;

    // Average similarity
    const avgSimilarity = similarCases.reduce((sum, c) => sum + c.similarity, 0) / similarCases.length;

    // Calculate days to outcome (average)
    const avgDaysToOutcome = Math.round(
      similarCases.reduce((sum, c) => sum + (c.days_to_outcome || 0), 0) / similarCases.length
    );

    return {
      predicted_outcome: predictedOutcome,
      confidence: Math.round(confidence),
      similar_cases_count: similarCases.length,
      avg_similarity: Math.round(avgSimilarity * 100),
      avg_days_to_outcome: avgDaysToOutcome,
      similar_cases: similarCases.slice(0, 5).map(c => ({
        pattern_id: c.pattern_id,
        family_id: c.family_id,
        similarity: Math.round(c.similarity * 100),
        outcome: c.outcome,
        days_to_outcome: c.days_to_outcome
      }))
    };
  }

  async savePrediction(familyId, prediction, similarCases) {
    const similarCasesJson = JSON.stringify(
      similarCases.map(c => c.pattern_id)
    );

    await this.db.query(`
      INSERT INTO pattern_predictions (
        family_id,
        predicted_outcome,
        confidence,
        similar_cases_count,
        similar_cases_json,
        avg_similarity
      ) VALUES (?, ?, ?, ?, ?, ?)
    `, [
      familyId,
      prediction.predicted_outcome,
      prediction.confidence,
      prediction.similar_cases_count,
      similarCasesJson,
      prediction.avg_similarity
    ]);
  }

  // Create historical pattern from current state
  async capturePattern(familyId, reason) {
    const features = await this.extractFeatures(familyId);

    await this.db.query(`
      INSERT INTO historical_patterns (
        family_id,
        snapshot_date,
        snapshot_reason,
        engagement_score,
        engagement_velocity,
        email_open_rate,
        days_since_login,
        payment_on_time_rate,
        event_attendance_rate,
        tenure_days,
        communication_responsiveness,
        feature_vector
      ) VALUES (?, NOW(), ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    `, [
      familyId,
      reason,
      features.engagement_score,
      features.engagement_velocity,
      features.email_open_rate,
      features.days_since_login,
      features.payment_on_time_rate,
      features.event_attendance_rate,
      features.tenure_days,
      features.communication_responsiveness,
      JSON.stringify(features)
    ]);
  }

  // Update historical pattern with outcome
  async recordOutcome(familyId, outcome, outcomeDate) {
    await this.db.query(`
      UPDATE historical_patterns
      SET 
        outcome = ?,
        outcome_date = ?,
        days_to_outcome = DATEDIFF(?, snapshot_date)
      WHERE family_id = ?
        AND outcome IS NULL
    `, [outcome, outcomeDate, outcomeDate, familyId]);
  }
}

module.exports = PatternMatcher;

Usage Example

const matcher = new PatternMatcher(db, {
  k: 10,
  minSimilarity: 0.70,
  weights: {
    engagement_score: 0.20,
    engagement_velocity: 0.20,
    payment_on_time_rate: 0.20,
    email_open_rate: 0.10,
    days_since_login: 0.10,
    event_attendance_rate: 0.10,
    tenure_days: 0.05,
    communication_responsiveness: 0.05
  }
});

// Predict withdrawal for Martinez family
const prediction = await matcher.predictOutcome(187, 'withdrawal');

console.log(`
Withdrawal Prediction for Family 187:

  Predicted Outcome: ${prediction.predicted_outcome}
  Confidence: ${prediction.confidence}%

  Based on ${prediction.similar_cases_count} similar historical cases
  Average similarity: ${prediction.avg_similarity}%
  Expected timeframe: ${prediction.avg_days_to_outcome} days

  Top 5 Similar Cases:
`);

prediction.similar_cases.forEach((c, i) => {
  console.log(`  ${i+1}. Family ${c.family_id} (${c.similarity}% similar)`);
  console.log(`     Outcome: ${c.outcome} after ${c.days_to_outcome} days`);
});

// Example output:
// Withdrawal Prediction for Family 187:
//   Predicted Outcome: withdrew
//   Confidence: 87%
//   
//   Based on 8 similar historical cases
//   Average similarity: 85%
//   Expected timeframe: 45 days
//   
//   Top 5 Similar Cases:
//   1. Family 142 (92% similar)
//      Outcome: withdrew after 38 days
//   2. Family 89 (88% similar)
//      Outcome: withdrew after 52 days
//   3. Family 201 (86% similar)
//      Outcome: withdrew after 41 days
//   4. Family 156 (84% similar)
//      Outcome: remained after 60 days
//   5. Family 78 (82% similar)
//      Outcome: withdrew after 49 days

Variations

By Matching Algorithm

Euclidean Distance (shown above): - Simple, interpretable - Sensitive to scale (requires normalization) - Good for continuous features

Cosine Similarity:

calculateCosineSimilarity(current, historical) {
  let dotProduct = 0;
  let normA = 0;
  let normB = 0;

  for (const feature of Object.keys(this.weights)) {
    const a = this.normalizeFeature(feature, current[feature]);
    const b = this.normalizeFeature(feature, historical[feature]);

    dotProduct += a * b;
    normA += a * a;
    normB += b * b;
  }

  return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}

Manhattan Distance:

calculateManhattanDistance(current, historical) {
  let distance = 0;

  for (const feature of Object.keys(this.weights)) {
    const weight = this.weights[feature];
    const a = this.normalizeFeature(feature, current[feature]);
    const b = this.normalizeFeature(feature, historical[feature]);

    distance += weight * Math.abs(a - b);
  }

  return 1 - distance;  // Convert to similarity
}

By Prediction Type

Binary Classification (yes/no): - Will withdraw or remain? - Will pay or default? - Use majority vote

Multi-Class Classification: - Will improve, remain stable, or decline? - Low/medium/high risk tier? - Use weighted voting

Regression (continuous value): - Predicted engagement score in 30 days? - Average similar cases' outcomes

Time-to-Event: - How many days until withdrawal? - Average days from similar cases

By Domain

Homeschool Co-op:

weights: {
  engagement_score: 0.20,
  engagement_velocity: 0.20,
  payment_on_time_rate: 0.20,
  event_attendance_rate: 0.15,
  volunteer_hours: 0.10,
  communication_responsiveness: 0.10,
  tenure_days: 0.05
}

SaaS Product:

weights: {
  feature_usage_breadth: 0.25,
  login_frequency: 0.20,
  support_ticket_frequency: 0.15,
  user_growth_rate: 0.15,
  payment_health: 0.15,
  tenure_days: 0.10
}

Property Management:

weights: {
  rent_payment_reliability: 0.30,
  maintenance_request_frequency: 0.20,
  lease_compliance: 0.20,
  neighbor_complaints: 0.15,
  communication_responsiveness: 0.10,
  tenure_months: 0.05
}

Consequences

Benefits

1. Leverage organizational memory "This situation is 87% similar to 8 past cases, 7 of which withdrew" - system learns from history.

2. No ML expertise required Simple distance metrics, interpretable results, no training needed.

3. Transparent predictions "Similar to these 5 families" - can show user why prediction was made.

4. Works with small data Can work with 50-100 historical cases (ML needs thousands).

5. Automatic improvement As more cases occur, predictions improve automatically.

6. Multi-outcome capable Can predict withdrawal, payment issues, engagement changes from same system.

7. Confidence scores Know when predictions are reliable vs uncertain.

Costs

1. Requires labeled historical data Need to know outcomes for past cases. New organizations don't have this.

2. Curse of dimensionality Too many features make finding similar cases harder.

3. Assumes past predicts future Works poorly when circumstances change (COVID, policy changes).

4. Sensitive to feature selection Wrong features or weights reduce accuracy.

5. Computational cost Need to compare against all historical cases for each prediction.

6. Cold start problem New features can't use historical data that doesn't include them.

7. Doesn't capture interactions Linear similarity doesn't capture complex feature interactions.

Sample Code

Batch prediction for all at-risk families:

async function batchPredictWithdrawals() {
  // Get all at-risk families
  const atRisk = await db.query(`
    SELECT family_id, family_name, engagement_score
    FROM families f
    JOIN family_engagement_metrics fem ON f.family_id = fem.family_id
    WHERE f.enrolled_current_semester = 1
      AND (fem.engagement_score < 60 OR fem.score_velocity = 'declining')
  `);

  const matcher = new PatternMatcher(db);
  const predictions = [];

  for (const family of atRisk) {
    const prediction = await matcher.predictOutcome(family.family_id, 'withdrawal');

    predictions.push({
      family_id: family.family_id,
      family_name: family.family_name,
      current_score: family.engagement_score,
      ...prediction
    });
  }

  // Sort by confidence (most confident predictions first)
  predictions.sort((a, b) => b.confidence - a.confidence);

  return predictions;
}

Validate prediction accuracy:

async function validatePredictionAccuracy() {
  // Get predictions made 60+ days ago
  const oldPredictions = await db.query(`
    SELECT 
      pp.prediction_id,
      pp.family_id,
      pp.predicted_outcome,
      pp.confidence,
      f.enrollment_status
    FROM pattern_predictions pp
    JOIN families f ON pp.family_id = f.family_id
    WHERE pp.prediction_date < DATE_SUB(NOW(), INTERVAL 60 DAY)
      AND pp.actual_outcome IS NULL
  `);

  let correct = 0;
  let total = 0;

  for (const pred of oldPredictions) {
    const actualOutcome = pred.enrollment_status === 'active' ? 'remained' : 'withdrew';

    // Update record with actual outcome
    await db.query(`
      UPDATE pattern_predictions
      SET 
        actual_outcome = ?,
        actual_outcome_date = NOW(),
        prediction_correct = ?
      WHERE prediction_id = ?
    `, [
      actualOutcome,
      pred.predicted_outcome === actualOutcome ? 1 : 0,
      pred.prediction_id
    ]);

    if (pred.predicted_outcome === actualOutcome) {
      correct++;
    }
    total++;
  }

  const accuracy = (correct / total) * 100;
  console.log(`Prediction Accuracy: ${accuracy.toFixed(1)}% (${correct}/${total})`);

  return { accuracy, correct, total };
}

Known Uses

Homeschool Co-op Intelligence Platform - 82% accuracy predicting withdrawals (65 historical cases) - Predictions made 30-45 days in advance - Average confidence: 78% - Enabled proactive interventions

Medical Diagnosis Systems - Case-based reasoning widely used in clinical decision support - Find similar patient cases, see what treatments worked - Explains recommendations by showing similar cases

Weather Forecasting - Analog forecasting: find similar historical weather patterns - Predict based on what happened in similar situations - Simple but effective approach

Financial Credit Scoring - Early credit scoring used "similar borrower" analysis - Find borrowers with similar profile, check default rates - Still used alongside more complex models

Requires: - Pattern 1: Universal Event Log - provides historical interaction data - Pattern 6: Composite Health Scoring - features for matching - Pattern 7: Multi-Dimensional Risk Assessment - additional features

Complements: - Pattern 12: Risk Stratification Models - ML alternative/complement - Pattern 13: Confidence Scoring - enhances prediction confidence - Pattern 14: Predictive Time Windows - when to make predictions

Enables: - Pattern 15: Intervention Recommendation Engine - predictions drive recommendations - Pattern 22: Progressive Escalation Sequences - predictive triggers - Pattern 23: Triggered Interventions - automated responses to predictions

References

On Case-Based Reasoning: - Aamodt, Agnar, and Enric Plaza. "Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches." AI Communications 7(1), 1994: 39-59. (Definitive CBR survey) - Kolodner, Janet. Case-Based Reasoning. Morgan Kaufmann, 1993. (Comprehensive CBR textbook)

On Nearest Neighbor Algorithms: - Cover, Thomas, and Peter Hart. "Nearest Neighbor Pattern Classification." IEEE Transactions on Information Theory 13(1), 1967: 21-27. (Original k-NN algorithm) - Altman, Naomi S. "An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression." The American Statistician 46(3), 1992: 175-185. - Mitchell, Tom M. Machine Learning. McGraw-Hill, 1997. (Chapter 8: Instance-Based Learning)

On Similarity Measures: - Cha, Sung-Hyuk. "Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions." International Journal of Mathematical Models and Methods in Applied Sciences 1(4), 2007. - "Cosine Similarity." Wikipedia. https://en.wikipedia.org/wiki/Cosine_similarity (For text/vector similarity)

On Implementation: - Scikit-learn Neighbors: https://scikit-learn.org/stable/modules/neighbors.html (k-NN implementation) - FAISS (Facebook AI Similarity Search): https://github.com/facebookresearch/faiss (Fast similarity search at scale) - Annoy (Spotify): https://github.com/spotify/annoy (Approximate nearest neighbors)

Related Patterns in This Trilogy: - Pattern 1 (Universal Event Log): Historical data source for matching - Pattern 13 (Confidence Scoring): Using similar cases to predict defaults - Pattern 18 (Cohort Analysis): Finding groups of similar cases - Volume 1: Domain patterns provide feature space for similarity - Volume 3, Pattern 8 (Intelligent Defaults): Pre-filling based on similar cases