Chapter 8: Architecture of Domain-Specific Systems
Having established what to build (domain ontologies and document patterns), we now address how to build it. This chapter provides architectural guidance for implementing domain-specific document automation systems.
Key Architectural Principles: 1. Separation of concerns (data, logic, presentation) 2. Technology agnosticism (patterns outlive specific tools) 3. Progressive enhancement (simple → sophisticated) 4. Fail-fast validation (catch errors before generation) 5. Idempotent operations (same inputs → same outputs)
While we'll reference specific technologies as examples, the architectural patterns apply regardless of implementation language or platform.
8.1 High-Level Architecture
8.1.1 The Five-Layer Architecture
Domain-specific document systems organize into five logical layers:
┌─────────────────────────────────────────────────────────────┐
│ Layer 5: PRESENTATION (User Interface) │
│ • Document type selection │
│ • Parameter configuration (which students, which semester) │
│ • Preview and generation triggers │
│ • Download/distribution │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 4: APPLICATION LOGIC (Business Rules) │
│ • Workflow orchestration │
│ • Authorization and access control │
│ • Audit logging │
│ • Batch processing coordination │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: DOCUMENT GENERATION ENGINE │
│ • Template selection and loading │
│ • Data-template merging │
│ • Conditional logic execution │
│ • Output format generation (DOCX, PDF, HTML) │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: DATA LAYER (Ontology Implementation) │
│ • Entity models and relationships │
│ • Validation rules │
│ • Calculated fields and business logic │
│ • Query optimization │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: DATA SOURCES │
│ • CSV files (user-provided) │
│ • Databases (SQL, NoSQL) │
│ • External APIs (MLS data, student information systems) │
│ • File storage (images, PDFs) │
└─────────────────────────────────────────────────────────────┘
Layer Communication Rules: - Each layer depends only on layer below - Upper layers don't know implementation details of lower layers - Changes to one layer don't ripple to others (loose coupling) - Data flows down (requests) and up (responses)
Benefits: - Testability: Each layer can be tested independently - Maintainability: Changes isolated to specific layers - Scalability: Can optimize individual layers (cache Layer 2, parallelize Layer 3) - Flexibility: Can swap implementations (CSV → Database at Layer 1)
8.1.2 Request Flow Example
User Request: "Generate report cards for all 5th grade students"
Layer 5 (Presentation):
↓ User selects "Report Card" document type
↓ Selects filter: grade_level = 5
↓ Clicks "Generate"
Layer 4 (Application Logic):
↓ Validates user permissions (can see 5th graders?)
↓ Logs request (who, what, when)
↓ Determines: Batch operation (multiple students)
↓ Calls Layer 3 for each student
Layer 3 (Document Generation):
↓ Loads template: reportCard.docx
↓ For each student, calls Layer 2 for data
↓ Merges data into template
↓ Generates PDF output
↓ Returns generated documents
Layer 2 (Data Layer):
↓ Receives request: buildReportCardData(student_id, semester_id)
↓ Queries Layer 1 for entities
↓ Joins: Student → Enrollment → Class → Grade
↓ Calculates: GPA, attendance percentage
↓ Returns structured data object
Layer 1 (Data Sources):
↓ Reads CSV files or queries database
↓ Returns raw entity data
↓ Validates data integrity
Result: - 15 report cards generated (one per 5th grade student) - Each as PDF file - Packaged in ZIP for download - Total time: 30 seconds
8.2 Data Layer Architecture
The Data Layer implements the domain ontology as executable code.
8.2.1 Entity Models
Each entity becomes a software object with attributes, relationships, and methods.
Example: Student Entity
class Student {
// Core Attributes
constructor(data) {
this.student_id = data.student_id;
this.first_name = data.first_name;
this.last_name = data.last_name;
this.grade_level = data.grade_level;
this.birth_date = new Date(data.birth_date);
this.status = data.status || 'Active';
this.photo = data.photo;
this.allergies = data.allergies;
}
// Computed Properties
get full_name() {
return `${this.first_name} ${this.last_name}`;
}
get age() {
const today = new Date();
const age = today.getFullYear() - this.birth_date.getFullYear();
return age;
}
get is_active() {
return this.status === 'Active';
}
// Relationships (loaded on demand)
async getParents() {
// Query parent_student table and parent table
return await ParentStudentRepository.findParentsByStudent(this.student_id);
}
async getEnrollments(semester_id) {
return await EnrollmentRepository.findByStudent(this.student_id, semester_id);
}
async getGrades(semester_id) {
const enrollments = await this.getEnrollments(semester_id);
const grades = [];
for (const enrollment of enrollments) {
const enrollmentGrades = await GradeRepository.findByEnrollment(enrollment.enrollment_id);
grades.push(...enrollmentGrades);
}
return grades;
}
// Validation
validate() {
const errors = [];
if (!this.student_id) errors.push("student_id is required");
if (!this.first_name) errors.push("first_name is required");
if (!this.last_name) errors.push("last_name is required");
if (this.grade_level < 0 || this.grade_level > 12) {
errors.push("grade_level must be between 0 and 12");
}
if (this.birth_date > new Date()) {
errors.push("birth_date cannot be in the future");
}
return errors;
}
}
Key Principles:
- Rich Domain Models: Objects aren't just data bags - they have behavior
- Computed Properties: Derived values calculated on demand (age, full_name)
- Lazy Loading: Relationships loaded when needed, not upfront
- Validation Built-In: Each entity validates itself
- Encapsulation: Internal details hidden from callers
8.2.2 Repository Pattern
Repositories handle data access, keeping database/CSV logic separate from entities.
class StudentRepository {
constructor(dataSource) {
this.dataSource = dataSource; // Could be CSV, SQL, API
}
async findById(student_id) {
const data = await this.dataSource.query(
'students',
{ student_id: student_id }
);
if (data.length === 0) return null;
return new Student(data[0]);
}
async findByGradeLevel(grade_level, status = 'Active') {
const data = await this.dataSource.query(
'students',
{ grade_level: grade_level, status: status }
);
return data.map(row => new Student(row));
}
async findAll(filters = {}) {
const data = await this.dataSource.query('students', filters);
return data.map(row => new Student(row));
}
async save(student) {
const errors = student.validate();
if (errors.length > 0) {
throw new ValidationError(errors);
}
await this.dataSource.upsert('students', student);
return student;
}
}
Benefits: - Abstraction: Callers don't know if data comes from CSV, SQL, or API - Testability: Can mock repository for unit tests - Swappability: Change data source without changing business logic - Caching: Repository can cache frequently-accessed data
8.2.3 Relationship Resolution
The most complex part: joining related entities correctly.
Strategy 1: Eager Loading (Load everything upfront)
async function buildReportCardData_Eager(student_id, semester_id) {
// Load all related data in parallel
const [student, enrollments, grades, attendance] = await Promise.all([
StudentRepository.findById(student_id),
EnrollmentRepository.findByStudent(student_id, semester_id),
GradeRepository.findByStudent(student_id, semester_id),
AttendanceRepository.findByStudent(student_id, semester_id)
]);
// Now join in memory
const classes = [];
for (const enrollment of enrollments) {
const classInfo = await ClassRepository.findById(enrollment.class_id);
const classGrades = grades.filter(g => g.enrollment_id === enrollment.enrollment_id);
classes.push({
class: classInfo,
enrollment: enrollment,
grades: classGrades
});
}
return { student, classes, attendance };
}
Pros: Fast (parallel queries), complete data Cons: May load data not needed, memory intensive
Strategy 2: Lazy Loading (Load on demand)
async function buildReportCardData_Lazy(student_id, semester_id) {
const student = await StudentRepository.findById(student_id);
// Enrollments loaded when accessed
student.enrollments = async function() {
return await EnrollmentRepository.findByStudent(student_id, semester_id);
};
return student; // Caller triggers additional loads as needed
}
Pros: Memory efficient, only load what's needed Cons: Many small queries (N+1 problem), slower
Strategy 3: Graph Loading (Specify what to include - Recommended)
async function buildReportCardData_Graph(student_id, semester_id, includes) {
const student = await StudentRepository.findById(student_id);
const data = { student };
if (includes.enrollments) {
data.enrollments = await EnrollmentRepository.findByStudent(student_id, semester_id);
if (includes.classes) {
for (const enrollment of data.enrollments) {
enrollment.class = await ClassRepository.findById(enrollment.class_id);
}
}
if (includes.grades) {
for (const enrollment of data.enrollments) {
enrollment.grades = await GradeRepository.findByEnrollment(enrollment.enrollment_id);
}
}
}
if (includes.attendance) {
data.attendance = await AttendanceRepository.findByStudent(student_id, semester_id);
}
return data;
}
// Usage:
const data = await buildReportCardData_Graph(student_id, semester_id, {
enrollments: true,
classes: true,
grades: true,
attendance: true
});
Pros: Explicit control, efficient Cons: Caller must know what to request
Recommendation: Use Strategy 3 (Graph Loading) - gives control and efficiency.
8.3 Document Generation Engine
The engine merges data with templates to produce documents.
8.3.1 Template Processing Pipeline
Template File (reportCard.docx)
↓
[1. LOAD TEMPLATE]
Parse template structure
Identify merge fields {{field}}
Identify loops {{#each}}...{{/each}}
Identify conditionals {{#if}}...{{/if}}
↓
[2. DATA PREPARATION]
Fetch data from Data Layer
Transform to template-friendly structure
Pre-calculate computed values
↓
[3. FIELD SUBSTITUTION]
Replace {{student.name}} with "Emma Anderson"
Format dates, numbers per locale
Handle missing values (show empty or default)
↓
[4. CONTROL FLOW]
Execute loops (repeat for each class)
Execute conditionals (show if honor roll)
Handle nested structures
↓
[5. FINALIZATION]
Apply styling and formatting
Generate output format (DOCX → PDF)
Clean up temporary files
↓
Output Document (reportCard_S001.pdf)
8.3.2 Template Syntax Design
Design Goals: - Readable: Non-programmers can understand - Powerful: Support loops, conditionals, formatting - Safe: Prevent code injection, limit complexity - Debuggable: Clear error messages
Common Syntax Styles:
1. Mustache-style ({{field}})
Student: {{student.full_name}}
Grade: {{student.grade_level}}
Classes:
{{#each classes}}
- {{title}}: {{final_grade}}
{{/each}}
Pros: Clean, widely supported Cons: Limited functionality
2. Jinja-style ({% block %} {{ var }})
Student: {{ student.full_name }}
Grade: {{ student.grade_level }}
{% for class in classes %}
- {{ class.title }}: {{ class.final_grade }}
{% endfor %}
{% if gpa >= 3.5 %}
HONOR ROLL
{% endif %}
Pros: More powerful, familiar to Python developers Cons: More complex syntax
Recommendation: Start with Mustache (simple), add Jinja features as needed (power).
8.4 Validation Architecture
Validation prevents bad outputs. Three layers of defense:
8.4.1 Layer 1: Schema Validation
Validates raw input data structure.
const Joi = require('joi');
const studentSchema = Joi.object({
student_id: Joi.string().required(),
first_name: Joi.string().required(),
last_name: Joi.string().required(),
grade_level: Joi.number().integer().min(0).max(12).required(),
birth_date: Joi.date().max('now').required(),
status: Joi.string().valid('Active', 'Inactive', 'Alumni').default('Active'),
photo: Joi.string().uri().optional(),
allergies: Joi.string().optional()
});
function validateStudentData(data) {
const { error, value } = studentSchema.validate(data, { abortEarly: false });
if (error) {
const errors = error.details.map(detail => ({
field: detail.path.join('.'),
message: detail.message,
value: detail.context.value
}));
throw new ValidationError('Schema validation failed', errors);
}
return value; // Validated and coerced data
}
Benefits: - Catches data type errors (string where number expected) - Enforces required fields - Validates ranges and formats - Provides clear error messages
8.4.2 Layer 2: Referential Integrity
Validates relationships between entities.
class ReferentialIntegrityValidator {
constructor(repositories) {
this.repos = repositories;
}
async validateEnrollment(enrollment) {
const errors = [];
// Check student exists
const student = await this.repos.student.findById(enrollment.student_id);
if (!student) {
errors.push({
entity: 'Enrollment',
id: enrollment.enrollment_id,
field: 'student_id',
message: `Student ${enrollment.student_id} not found`,
suggestion: 'Verify student_id or add student to students.csv'
});
}
// Check class exists
const classInfo = await this.repos.class.findById(enrollment.class_id);
if (!classInfo) {
errors.push({
entity: 'Enrollment',
id: enrollment.enrollment_id,
field: 'class_id',
message: `Class ${enrollment.class_id} not found`,
suggestion: 'Verify class_id or add class to classes.csv'
});
}
return errors;
}
}
8.4.3 Layer 3: Business Rules
Validates domain-specific logic.
class BusinessRuleValidator {
async validateReportCardGeneration(student_id, semester_id) {
const errors = [];
const warnings = [];
const student = await StudentRepository.findById(student_id);
const enrollments = await EnrollmentRepository.findByStudent(student_id, semester_id);
// Rule: Student must be active
if (student.status !== 'Active') {
warnings.push({
rule: 'active_student',
message: `Student ${student.full_name} is ${student.status}`,
severity: 'warning'
});
}
// Rule: Student must have at least one enrollment
if (enrollments.length === 0) {
errors.push({
rule: 'has_enrollments',
message: `Student ${student.full_name} has no classes this semester`,
severity: 'error',
suggestion: 'Add enrollments or skip this student'
});
}
// Rule: All enrollments must have final grades
for (const enrollment of enrollments) {
if (!enrollment.final_grade) {
errors.push({
rule: 'grades_complete',
message: `No final grade for ${student.full_name} in ${enrollment.class.title}`,
severity: 'error',
suggestion: 'Enter final grade or mark as incomplete'
});
}
}
return { errors, warnings };
}
}
8.5 Batch Processing Architecture
Generating hundreds of documents requires careful orchestration.
8.5.1 Batch Strategies
Strategy 1: Sequential Processing
async function generateBatch_Sequential(documentType, entityIds) {
const results = [];
for (const id of entityIds) {
try {
const document = await generateDocument(documentType, id);
results.push({ id, status: 'success', document });
} catch (error) {
results.push({ id, status: 'error', error: error.message });
}
}
return results;
}
Pros: Simple, easy to debug, predictable memory usage Cons: Slow (no parallelism)
Strategy 2: Chunked Parallel (Recommended)
async function generateBatch_Chunked(documentType, entityIds, chunkSize = 10) {
const results = [];
for (let i = 0; i < entityIds.length; i += chunkSize) {
const chunk = entityIds.slice(i, i + chunkSize);
const chunkResults = await Promise.all(
chunk.map(id =>
generateDocument(documentType, id)
.then(doc => ({ id, status: 'success', document: doc }))
.catch(error => ({ id, status: 'error', error: error.message }))
)
);
results.push(...chunkResults);
// Report progress
const progress = Math.min(i + chunkSize, entityIds.length);
reportProgress(progress, entityIds.length);
}
return results;
}
Pros: Balanced (speed + memory), progress tracking Cons: Slightly more complex
8.5.2 Progress Tracking
Users need feedback during long operations:
class ProgressTracker {
constructor(total) {
this.total = total;
this.completed = 0;
this.startTime = Date.now();
this.listeners = [];
}
increment() {
this.completed++;
this.notifyListeners();
}
getProgress() {
const percentage = (this.completed / this.total) * 100;
const elapsed = Date.now() - this.startTime;
const rate = this.completed / (elapsed / 1000); // per second
const remaining = (this.total - this.completed) / rate; // seconds
return {
completed: this.completed,
total: this.total,
percentage: percentage.toFixed(1),
elapsed: Math.round(elapsed / 1000),
estimatedRemaining: Math.round(remaining)
};
}
onProgress(callback) {
this.listeners.push(callback);
}
notifyListeners() {
const progress = this.getProgress();
this.listeners.forEach(listener => listener(progress));
}
}
8.6 Chapter Summary
This chapter provided comprehensive architectural guidance for implementing domain-specific document automation systems:
Five-Layer Architecture: Clear separation of concerns (Presentation, Application, Generation, Data, Sources)
Data Layer: Rich domain models, repository pattern, relationship resolution strategies (eager, lazy, graph loading)
Document Generation: Template processing pipeline, syntax design (Mustache/Jinja), output format generation
Validation: Three layers (Schema, Referential Integrity, Business Rules) with user-friendly error reporting
Batch Processing: Sequential vs. chunked parallel strategies, progress tracking, partial success handling
Key Takeaways: - Architecture must be layered and modular - Validation is critical - fail fast with clear messages - Batch processing requires careful strategy (chunked parallel recommended) - User experience matters - progress tracking, partial success - Plan for scale from beginning
Further Reading
On Software Architecture: - Bass, Len, et al. Software Architecture in Practice, 4th Edition. Addison-Wesley, 2021. (Comprehensive architecture guide) - Richards, Mark, and Neal Ford. Fundamentals of Software Architecture. O'Reilly, 2020. (Modern architecture patterns) - Martin, Robert C. Clean Architecture. Prentice Hall, 2017. (Dependency rule and boundaries)
On Microservices: - Newman, Sam. Building Microservices, 2nd Edition. O'Reilly, 2021. (Definitive guide to microservices) - Richardson, Chris. Microservices Patterns. Manning, 2018. (Pattern catalog) - "Microservices." Martin Fowler. https://martinfowler.com/articles/microservices.html (Foundational article)
On API Design: - Masse, Mark. REST API Design Rulebook. O'Reilly, 2011. - "API Design Guide." Google Cloud. https://cloud.google.com/apis/design (Google's API standards) - OpenAPI Specification: https://swagger.io/specification/ (Standard for REST APIs) - GraphQL Documentation: https://graphql.org/learn/ (Alternative to REST)
On Database Design: - Karwin, Bill. SQL Antipatterns. Pragmatic Bookshelf, 2010. (Common database mistakes) - Kleppmann, Martin. Designing Data-Intensive Applications. O'Reilly, 2017. (Modern data systems) - Fowler, Martin. "Patterns of Enterprise Application Architecture." Addison-Wesley, 2002. (Data patterns)
On Document Generation Engines: - Apache POI: https://poi.apache.org/ (Java library for Office documents) - Docxtemplater: https://docxtemplater.com/ (JavaScript Word template engine) - python-docx: https://python-docx.readthedocs.io/ (Python Word library) - Pandoc: https://pandoc.org/ (Universal document converter) - Aspose.Words: https://products.aspose.com/words/ (Commercial document API)
On Template Engines: - Jinja2: https://jinja.palletsprojects.com/ (Python template engine) - Handlebars: https://handlebarsjs.com/ (JavaScript template engine) - Liquid: https://shopify.github.io/liquid/ (Ruby template engine)
Related Patterns in This Trilogy: - Volume 2, Pattern 27 (Event Sourcing): Architectural pattern for event-driven systems - Volume 2, Pattern 28 (CQRS): Command Query Responsibility Segregation - Volume 2, Pattern 30 (Scalability Patterns): Scaling architecture - Volume 2, Pattern 32 (System Integration): Integration patterns
Cloud Platforms: - AWS Lambda: https://aws.amazon.com/lambda/ (Serverless document generation) - Azure Functions: https://azure.microsoft.com/en-us/services/functions/ (Serverless alternative) - Google Cloud Functions: https://cloud.google.com/functions (Another serverless option)