Pattern 2: Behavioral Graph Construction
Intent
Build a network representation of entities and their relationships over time, enabling social network analysis, influence mapping, community detection, and relationship-based predictions.
Also Known As
- Relationship Graph
- Social Network Model
- Interaction Network
- Entity Relationship Graph
- Connection Graph
Problem
The event log (Pattern 1) captures what happened, but relationships are implicit, not explicit.
When Sarah looks at her co-op, she sees: - The Martinez family referred the Chen family - Both families volunteer together frequently - The Chen family is highly engaged, Martinez is declining - Chen might influence Martinez to stay (or Martinez might influence Chen to leave)
But the system only knows: - Martinez family exists (record in families table) - Chen family exists (record in families table) - Various interaction events for each family
The relationship between them is invisible to the system.
Without explicit relationship modeling: - Can't identify influential families (who should we ask to help recruit?) - Can't detect communities (which families form tight-knit groups?) - Can't predict relationship effects (will Chen's engagement boost Martinez?) - Can't optimize interventions (who should mentor at-risk families?) - Can't understand referral networks (where do our best families come from?)
Context
When this pattern applies:
- Relationships between entities matter (not just individual behavior)
- Influence flows through the network (people affect each other)
- Communities exist (clusters of more-connected entities)
- Referrals are important (network growth mechanism)
- Social dynamics impact outcomes (engagement is contagious)
When this pattern may not be needed:
- Entities are truly independent (no meaningful relationships)
- Pure transactional interactions (no ongoing connections)
- Very small scale where humans know all relationships
- Privacy concerns prevent relationship tracking
- Network effects are negligible
Forces
Competing concerns:
1. Completeness vs Complexity - Want to model all relationship types (referred, volunteers_with, attends_with) - But graph becomes unwieldy with too many edge types - Balance: Model high-value relationships, aggregate the rest
2. Static vs Temporal - Relationships change over time (Martinez and Chen volunteer together now, but not last semester) - But temporal graphs are complex to query and visualize - Balance: Weighted edges with decay functions, snapshot views
3. Directed vs Undirected - Some relationships are directional (Martinez referred Chen, not vice versa) - Others are symmetric (volunteer together, attend events together) - Balance: Support both, be explicit about directionality
4. Weighted vs Binary - Some relationships are stronger (volunteer together 20 times vs 1 time) - But weights add complexity - Balance: Weight by interaction frequency and recency
5. Storage vs Computation - Can materialize graph (store edges explicitly) - fast queries, more storage - Can compute graph on-demand from event log - less storage, slower queries - Balance: Hybrid approach - materialize core relationships, compute others as needed
Solution
Build a graph data structure where:
Nodes represent entities:
- Families
- Students
- Teachers
- Volunteers
- Events (as connection points)
Edges represent relationships:
- referred (directional, weighted by conversion outcome)
- volunteers_with (undirected, weighted by co-volunteer hours)
- attends_with (undirected, weighted by co-attendance frequency)
- friends_with (undirected, reported or inferred from interactions)
- mentors (directional, explicit or assigned)
Edge properties:
- weight - strength of relationship (0.0 to 1.0)
- first_interaction - when relationship began
- last_interaction - most recent interaction
- interaction_count - total interactions
- relationship_type - edge label
Derive graph from event log: - Analyze interaction events to infer relationships - Weight edges by frequency and recency - Update graph as new interactions occur
Structure
Database Schema
Option 1: Explicit Edge Table (Recommended)
-- Relationship graph edges
CREATE TABLE relationship_edges (
edge_id INT PRIMARY KEY IDENTITY(1,1),
-- The relationship
source_entity_type VARCHAR(50) NOT NULL, -- 'family', 'student'
source_entity_id INT NOT NULL,
target_entity_type VARCHAR(50) NOT NULL,
target_entity_id INT NOT NULL,
-- Relationship characteristics
relationship_type VARCHAR(50) NOT NULL, -- 'referred', 'volunteers_with'
is_directed BIT NOT NULL DEFAULT 0, -- 1 = directional, 0 = symmetric
-- Strength and temporal
weight DECIMAL(5,3) DEFAULT 1.0, -- 0.0 to 1.0
first_interaction_date DATETIME2,
last_interaction_date DATETIME2,
interaction_count INT DEFAULT 0,
-- Metadata
metadata NVARCHAR(MAX) NULL, -- JSON for additional context
created_date DATETIME2 DEFAULT GETDATE(),
updated_date DATETIME2 DEFAULT GETDATE(),
-- Prevent duplicates
CONSTRAINT UQ_edge UNIQUE (
source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
relationship_type
)
);
-- Indexes for graph traversal
CREATE INDEX IX_edge_source ON relationship_edges(
source_entity_type, source_entity_id
);
CREATE INDEX IX_edge_target ON relationship_edges(
target_entity_type, target_entity_id
);
CREATE INDEX IX_edge_type ON relationship_edges(relationship_type);
CREATE INDEX IX_edge_weight ON relationship_edges(weight DESC);
Graph Construction Process
Step 1: Extract relationships from event log
-- Identify families who volunteered together
-- (attended same volunteer events on same days)
CREATE PROCEDURE sp_build_volunteer_relationships
AS
BEGIN
-- Find co-volunteer pairs
INSERT INTO relationship_edges (
source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
relationship_type, is_directed,
weight, first_interaction_date, last_interaction_date, interaction_count
)
SELECT
'family' as source_entity_type,
v1.family_id as source_entity_id,
'family' as target_entity_type,
v2.family_id as target_entity_id,
'volunteers_with' as relationship_type,
0 as is_directed, -- Symmetric relationship
CASE
WHEN COUNT(*) >= 10 THEN 1.0
WHEN COUNT(*) >= 5 THEN 0.7
WHEN COUNT(*) >= 2 THEN 0.4
ELSE 0.2
END as weight,
MIN(v1.volunteer_date) as first_interaction_date,
MAX(v1.volunteer_date) as last_interaction_date,
COUNT(*) as interaction_count
FROM volunteer_hours v1
JOIN volunteer_hours v2 ON v1.event_id = v2.event_id
AND v1.volunteer_date = v2.volunteer_date
WHERE v1.family_id < v2.family_id -- Prevent duplicates (only need one direction for undirected)
AND v1.volunteer_date >= DATEADD(year, -2, GETDATE()) -- Last 2 years
GROUP BY v1.family_id, v2.family_id
HAVING COUNT(*) >= 2 -- At least 2 co-volunteer events
ON CONFLICT (source_entity_type, source_entity_id, target_entity_type, target_entity_id, relationship_type)
DO UPDATE SET
weight = EXCLUDED.weight,
last_interaction_date = EXCLUDED.last_interaction_date,
interaction_count = EXCLUDED.interaction_count,
updated_date = GETDATE();
END;
Step 2: Build referral network
-- Track who referred whom
CREATE PROCEDURE sp_build_referral_relationships
AS
BEGIN
INSERT INTO relationship_edges (
source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
relationship_type, is_directed,
weight, first_interaction_date, interaction_count
)
SELECT
'family' as source_entity_type,
f.referred_by_family_id as source_entity_id,
'family' as target_entity_type,
f.family_id as target_entity_id,
'referred' as relationship_type,
1 as is_directed, -- Directional: A referred B
CASE
WHEN f.enrollment_date IS NOT NULL THEN 1.0 -- Converted
WHEN f.trial_date IS NOT NULL THEN 0.5 -- Attended trial
ELSE 0.2 -- Inquiry only
END as weight,
f.inquiry_date as first_interaction_date,
1 as interaction_count
FROM families f
WHERE f.referred_by_family_id IS NOT NULL
ON CONFLICT (source_entity_type, source_entity_id, target_entity_type, target_entity_id, relationship_type)
DO UPDATE SET
weight = EXCLUDED.weight, -- Update if conversion status changed
updated_date = GETDATE();
END;
Implementation
Graph Analysis Functions
1. Find influential families (high out-degree in referral network)
async function findInfluentialFamilies(limit = 10) {
const results = await db.query(`
SELECT
re.source_entity_id as family_id,
f.family_name,
COUNT(*) as referrals_made,
SUM(re.weight) as weighted_referrals,
AVG(re.weight) as avg_referral_quality
FROM relationship_edges re
JOIN families f ON re.source_entity_id = f.family_id
WHERE re.relationship_type = 'referred'
AND re.source_entity_type = 'family'
GROUP BY re.source_entity_id, f.family_name
ORDER BY weighted_referrals DESC
LIMIT ?
`, [limit]);
return results.map(r => ({
family_id: r.family_id,
family_name: r.family_name,
referrals: r.referrals_made,
quality_score: r.weighted_referrals,
conversion_rate: r.avg_referral_quality
}));
}
2. Detect communities (families who cluster together)
// Simple community detection: families connected by multiple edge types
async function detectCommunities(minConnections = 3) {
const results = await db.query(`
WITH family_connections AS (
SELECT
LEAST(source_entity_id, target_entity_id) as family_a,
GREATEST(source_entity_id, target_entity_id) as family_b,
COUNT(DISTINCT relationship_type) as connection_types,
SUM(weight) as total_weight
FROM relationship_edges
WHERE source_entity_type = 'family'
AND target_entity_type = 'family'
GROUP BY LEAST(source_entity_id, target_entity_id),
GREATEST(source_entity_id, target_entity_id)
HAVING COUNT(DISTINCT relationship_type) >= ?
)
SELECT
fc.family_a,
fa.family_name as family_a_name,
fc.family_b,
fb.family_name as family_b_name,
fc.connection_types,
fc.total_weight
FROM family_connections fc
JOIN families fa ON fc.family_a = fa.family_id
JOIN families fb ON fc.family_b = fb.family_id
ORDER BY fc.total_weight DESC
`, [minConnections]);
return results;
}
3. Calculate centrality (how "central" each family is in network)
// Degree centrality: how many connections does this family have?
async function calculateDegreeCentrality() {
const results = await db.query(`
SELECT
f.family_id,
f.family_name,
COUNT(DISTINCT CASE WHEN re.source_entity_id = f.family_id THEN re.edge_id END) as outbound_edges,
COUNT(DISTINCT CASE WHEN re.target_entity_id = f.family_id THEN re.edge_id END) as inbound_edges,
COUNT(DISTINCT re.edge_id) as total_edges,
SUM(re.weight) as weighted_centrality
FROM families f
LEFT JOIN relationship_edges re ON
(re.source_entity_id = f.family_id AND re.source_entity_type = 'family')
OR (re.target_entity_id = f.family_id AND re.target_entity_type = 'family')
WHERE f.enrolled_current_semester = 1
GROUP BY f.family_id, f.family_name
ORDER BY weighted_centrality DESC
`);
return results;
}
4. Find potential mentors for at-risk families
async function findPotentialMentors(atRiskFamilyId) {
// Find highly engaged families with existing connections to at-risk family
const results = await db.query(`
SELECT
f.family_id,
f.family_name,
fem.engagement_score,
re.relationship_type,
re.weight as connection_strength,
CASE
WHEN re.relationship_type = 'volunteers_with' THEN 'Already volunteer together'
WHEN re.relationship_type = 'attends_with' THEN 'Attend events together'
WHEN EXISTS (
SELECT 1 FROM relationship_edges re2
WHERE re2.source_entity_id = f.family_id
AND re2.relationship_type = 'referred'
AND re2.weight >= 0.8
) THEN 'Strong referrer'
ELSE 'Good match'
END as mentor_reason
FROM families f
JOIN family_engagement_metrics fem ON f.family_id = fem.family_id
LEFT JOIN relationship_edges re ON
(re.source_entity_id = f.family_id OR re.target_entity_id = f.family_id)
AND (re.source_entity_id = ? OR re.target_entity_id = ?)
WHERE f.family_id != ?
AND fem.engagement_score >= 80 -- Highly engaged
AND f.enrolled_current_semester = 1
ORDER BY
CASE WHEN re.edge_id IS NOT NULL THEN 0 ELSE 1 END, -- Existing connections first
fem.engagement_score DESC
LIMIT 5
`, [atRiskFamilyId, atRiskFamilyId, atRiskFamilyId]);
return results;
}
Temporal Decay of Relationships
// Relationships decay over time if not reinforced
async function applyTemporalDecay() {
await db.query(`
UPDATE relationship_edges
SET
weight = GREATEST(
0.1, -- Minimum weight
weight * EXP(-0.1 * DATEDIFF(month, last_interaction_date, GETDATE()))
),
updated_date = GETDATE()
WHERE last_interaction_date < DATEADD(month, -6, GETDATE())
AND weight > 0.1
`);
}
// Run this monthly or when calculating fresh metrics
Graph Visualization Export
async function exportGraphForVisualization(relationshipTypes = null) {
const typeFilter = relationshipTypes
? `AND relationship_type IN (${relationshipTypes.map(t => `'${t}'`).join(',')})`
: '';
const edges = await db.query(`
SELECT
edge_id,
source_entity_id as source,
target_entity_id as target,
relationship_type as type,
weight,
is_directed as directed
FROM relationship_edges
WHERE weight >= 0.2 -- Filter weak edges
${typeFilter}
`);
const nodes = await db.query(`
SELECT DISTINCT
f.family_id as id,
f.family_name as label,
fem.engagement_score,
fem.engagement_tier as group
FROM families f
JOIN family_engagement_metrics fem ON f.family_id = fem.family_id
WHERE f.family_id IN (
SELECT DISTINCT source_entity_id FROM relationship_edges
UNION
SELECT DISTINCT target_entity_id FROM relationship_edges
)
`);
// Return in format compatible with vis.js, D3.js, etc.
return {
nodes: nodes.map(n => ({
id: n.id,
label: n.label,
value: n.engagement_score, // Node size
group: n.group, // Node color
title: `${n.label} (${n.engagement_score}/100)`
})),
edges: edges.map(e => ({
from: e.source,
to: e.target,
value: e.weight * 10, // Edge thickness
label: e.type,
arrows: e.directed ? 'to' : undefined
}))
};
}
Variations
By Scale
Small organizations (<100 entities): - Single graph table sufficient - Can compute all relationships on-demand - Visualizations work well (not too cluttered) - Manual community identification feasible
Medium organizations (100-500 entities): - Materialize high-value edges - Compute others as needed - Visualization requires filtering - Algorithmic community detection helpful
Large organizations (500+ entities): - Definitely materialize all edges - Use graph database (Neo4j, Amazon Neptune) - Visualization needs heavy filtering or clustering - Advanced algorithms for analysis (PageRank, etc.)
By Domain
Homeschool co-op: - Family-centric networks - Referral chains important - Volunteer collaboration networks - Student friendships (secondary)
Property management: - Tenant-to-tenant recommendations - Neighbor relationships (same floor/building) - Maintenance vendor networks - Property manager relationships
Medical practice: - Physician referral networks (who refers patients) - Family relationships (shared insurance, address) - Care team collaboration (multi-provider care)
By Technology
SQL Database (current examples): - Edge table with indexes - SQL queries for graph operations - Export to visualization tools
Graph Database (Neo4j, Neptune):
// Create relationship
CREATE (f1:Family {id: 187})-[:VOLUNTEERS_WITH {weight: 0.8}]->(f2:Family {id: 234})
// Find influential families
MATCH (f:Family)-[r:REFERRED]->(referred:Family)
WHERE r.weight >= 0.5
RETURN f, COUNT(referred) as referrals
ORDER BY referrals DESC
// Find communities (triangle detection)
MATCH (a:Family)-[:VOLUNTEERS_WITH]-(b:Family)-[:VOLUNTEERS_WITH]-(c:Family)-[:VOLUNTEERS_WITH]-(a)
RETURN a, b, c
Graph Database Advantages: - Native graph traversal (faster for complex queries) - Built-in algorithms (PageRank, community detection, pathfinding) - Better for very large graphs (millions of nodes/edges)
SQL Database Advantages: - Don't need to learn new technology - Can integrate with existing relational data easily - Sufficient for most organizational scales (<10,000 entities)
Consequences
Benefits
1. Influence identification Know which families are "super connectors": - Martinez family has referred 7 families (4 converted) - High-value recruitment asset - Offer referral incentives to leverage influence
2. Community detection Identify tight-knit groups: - 5 families volunteer together regularly, attend all events together - If one shows withdrawal risk, others might follow or could intervene - Target group activities to strengthen communities
3. Relationship-based predictions "Chen family is highly engaged and volunteers with Martinez family regularly. Martinez's engagement is declining, but Chen's influence might stabilize them."
4. Mentor matching When family shows risk, find natural mentors: - Existing connections (already know each other) - High engagement (good role models) - Similar demographics (relatable experiences)
5. Network growth strategy Understand referral dynamics: - Church referrals convert at 71%, friend referrals at 52% - Families who refer 2+ others have 3.2 year average tenure - Incentivize referrals from high-quality sources
6. Social contagion modeling Engagement spreads through networks: - Highly engaged family connecting with at-risk family → risk reduced - Multiple at-risk families in same community → higher churn risk
Costs
1. Complexity overhead - Graph queries more complex than simple SQL - Requires understanding of network concepts - Debugging relationship issues can be tricky
2. Computational expense - Some graph algorithms are expensive (O(n²) or worse) - Large graphs require optimization - Real-time updates can be costly
3. Data quality challenges - Inferred relationships may be wrong - Weights are heuristic, not precise - Temporal decay functions are estimates
4. Visualization limitations - Large graphs become "hairball" visualizations - Filtering loses context - Static visualizations miss temporal dynamics
5. Privacy implications - Explicitly modeling relationships may feel intrusive - Some people don't want relationships tracked - Must handle data deletion carefully (affects multiple families)
Sample Code
Complete graph builder:
class RelationshipGraphBuilder {
constructor(db) {
this.db = db;
}
async buildFullGraph() {
console.log('Building relationship graph...');
// Build each relationship type
await this.buildReferralRelationships();
await this.buildVolunteerRelationships();
await this.buildEventAttendanceRelationships();
// Apply temporal decay
await this.applyTemporalDecay();
// Calculate metrics
const stats = await this.calculateGraphStatistics();
console.log('Graph built:', stats);
return stats;
}
async buildReferralRelationships() {
// Implementation shown earlier
}
async buildVolunteerRelationships() {
// Implementation shown earlier
}
async buildEventAttendanceRelationships() {
const result = await this.db.query(`
INSERT INTO relationship_edges (
source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
relationship_type, is_directed,
weight, interaction_count
)
SELECT
'family' as source_entity_type,
e1.family_id as source_entity_id,
'family' as target_entity_type,
e2.family_id as target_entity_id,
'attends_with' as relationship_type,
0 as is_directed,
CASE
WHEN COUNT(*) >= 10 THEN 1.0
WHEN COUNT(*) >= 5 THEN 0.6
ELSE 0.3
END as weight,
COUNT(*) as interaction_count
FROM event_attendance e1
JOIN event_attendance e2 ON e1.event_id = e2.event_id
WHERE e1.family_id < e2.family_id
AND e1.attended = 1 AND e2.attended = 1
AND e1.event_date >= DATE_SUB(NOW(), INTERVAL 1 YEAR)
GROUP BY e1.family_id, e2.family_id
HAVING COUNT(*) >= 3
ON DUPLICATE KEY UPDATE
weight = VALUES(weight),
interaction_count = VALUES(interaction_count),
updated_date = NOW()
`);
return result.affectedRows;
}
async applyTemporalDecay() {
// Decay factor: 10% per month
const result = await this.db.query(`
UPDATE relationship_edges
SET weight = GREATEST(0.1,
weight * EXP(-0.1 * TIMESTAMPDIFF(MONTH, last_interaction_date, NOW()))
)
WHERE last_interaction_date < DATE_SUB(NOW(), INTERVAL 6 MONTH)
AND weight > 0.1
`);
return result.affectedRows;
}
async calculateGraphStatistics() {
const stats = await this.db.query(`
SELECT
relationship_type,
COUNT(*) as edge_count,
AVG(weight) as avg_weight,
SUM(CASE WHEN is_directed = 1 THEN 1 ELSE 0 END) as directed_count,
SUM(CASE WHEN is_directed = 0 THEN 1 ELSE 0 END) as undirected_count
FROM relationship_edges
GROUP BY relationship_type
`);
const nodeCount = await this.db.query(`
SELECT COUNT(DISTINCT entity_id) as node_count
FROM (
SELECT source_entity_id as entity_id FROM relationship_edges
UNION
SELECT target_entity_id FROM relationship_edges
) all_nodes
`);
return {
total_nodes: nodeCount[0].node_count,
relationships: stats
};
}
}
module.exports = RelationshipGraphBuilder;
Known Uses
Homeschool Co-op Intelligence Platform (2024-2025) - 100 families, ~300 edges - Referral network: identified 5 "super connectors" responsible for 40% of referrals - Volunteer network: detected 3 tight-knit communities (8-12 families each) - Mentor matching: matched 8 at-risk families with engaged mentors, 6 remained
Social Network Platforms (inspiration) - Facebook, LinkedIn: massive graphs of billions of nodes - Use sophisticated algorithms: PageRank, community detection, link prediction - Influence organizational intelligence thinking
Academic Research - Christakis & Fowler: "Connected" - showed health behaviors spread through networks - Granovetter: "Strength of Weak Ties" - weak connections often more valuable - Network science principles apply to organizational relationships
Related Patterns
Prerequisites: - Pattern 1: Universal Event Log - provides raw data to extract relationships from
Complementary: - Pattern 3: Multi-Channel Tracking - captures relationships across channels - Pattern 18: Opportunity Mining - compare network-connected vs isolated entities - Pattern 16: Cohort Discovery & Analysis - can discover relationship patterns automatically
Enabled by this pattern: - Pattern 11: Historical Pattern Matching - "similar families" includes network position - Pattern 15: Intervention Recommendation Engine - can recommend "connect with mentor X" - Pattern 24: Template-Based Communication - personalize based on relationship context
Alternatives: - Flat entity model (ignore relationships) - simpler but loses valuable signal - Implicit relationships only (never materialize edges) - saves storage but limits analysis
References
Academic Foundations
- Newman, M.E.J. (2010). Networks: An Introduction. Oxford University Press. ISBN: 978-0199206650 - Comprehensive network theory textbook
- Christakis, Nicholas A., and James H. Fowler (2009). Connected. Little, Brown Spark. ISBN: 978-0316036146 - Social network influence and contagion
- Granovetter, Mark S. (1973). "The Strength of Weak Ties." American Journal of Sociology 78(6): 1360-1380. https://www.jstor.org/stable/2776392
- Watts, Duncan J. (2003). Six Degrees: The Science of a Connected Age. W.W. Norton. ISBN: 978-0393041422 - Small-world networks
- Barabási, Albert-László (2016). Network Science. Cambridge University Press. http://networksciencebook.com/ - Free online textbook
Practical Implementation
- Robinson, Ian, Jim Webber, and Emil Eifrem (2015). Graph Databases (2nd ed.). O'Reilly. ISBN: 978-1491930892
- Neo4j Graph Database: https://neo4j.com/ - Leading graph database with Cypher query language
- NetworkX: https://networkx.org/ - Python library for network analysis
- igraph: https://igraph.org/ - Fast network analysis library (R/Python/C)
- Cytoscape: https://cytoscape.org/ - Network visualization platform
Graph Algorithms
- PageRank: Page, L., et al. (1999). "The PageRank Citation Ranking." http://ilpubs.stanford.edu:8090/422/ - Google's influence algorithm
- Community Detection: Blondel, V.D., et al. (2008). "Fast unfolding of communities in large networks." https://arxiv.org/abs/0803.0476 - Louvain algorithm
- Centrality Measures: Freeman, L.C. (1978). "Centrality in Social Networks." Social Networks 1(3): 215-239.
Related Trilogy Patterns
- Pattern 1: Universal Event Log - Events feed the graph
- Pattern 3: Multi-Channel Tracking - Relationships across channels
- Pattern 11: Historical Pattern Matching - Graph patterns over time
- Volume 3, Pattern 21: External Data Integration - Import external relationships
Tools & Services
- Amazon Neptune: https://aws.amazon.com/neptune/ - Managed graph database
- Azure Cosmos DB Gremlin API: https://azure.microsoft.com/en-us/services/cosmos-db/ - Graph database service
- TigerGraph: https://www.tigergraph.com/ - Enterprise graph analytics platform
- Graphistry: https://www.graphistry.com/ - Visual graph analytics