Volume 1: Domain-Specific Document Automation

Chapter 8: Architecture of Domain-Specific Systems

Having established what to build (domain ontologies and document patterns), we now address how to build it. This chapter provides architectural guidance for implementing domain-specific document automation systems.

Key Architectural Principles: 1. Separation of concerns (data, logic, presentation) 2. Technology agnosticism (patterns outlive specific tools) 3. Progressive enhancement (simple → sophisticated) 4. Fail-fast validation (catch errors before generation) 5. Idempotent operations (same inputs → same outputs)

While we'll reference specific technologies as examples, the architectural patterns apply regardless of implementation language or platform.

8.1 High-Level Architecture

8.1.1 The Five-Layer Architecture

Domain-specific document systems organize into five logical layers:

┌─────────────────────────────────────────────────────────────┐
│  Layer 5: PRESENTATION (User Interface)                      │
│  • Document type selection                                   │
│  • Parameter configuration (which students, which semester)  │
│  • Preview and generation triggers                           │
│  • Download/distribution                                     │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Layer 4: APPLICATION LOGIC (Business Rules)                 │
│  • Workflow orchestration                                    │
│  • Authorization and access control                          │
│  • Audit logging                                             │
│  • Batch processing coordination                             │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Layer 3: DOCUMENT GENERATION ENGINE                         │
│  • Template selection and loading                            │
│  • Data-template merging                                     │
│  • Conditional logic execution                               │
│  • Output format generation (DOCX, PDF, HTML)                │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Layer 2: DATA LAYER (Ontology Implementation)              │
│  • Entity models and relationships                           │
│  • Validation rules                                          │
│  • Calculated fields and business logic                      │
│  • Query optimization                                        │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Layer 1: DATA SOURCES                                       │
│  • CSV files (user-provided)                                 │
│  • Databases (SQL, NoSQL)                                    │
│  • External APIs (MLS data, student information systems)     │
│  • File storage (images, PDFs)                               │
└─────────────────────────────────────────────────────────────┘

Layer Communication Rules: - Each layer depends only on layer below - Upper layers don't know implementation details of lower layers - Changes to one layer don't ripple to others (loose coupling) - Data flows down (requests) and up (responses)

Benefits: - Testability: Each layer can be tested independently - Maintainability: Changes isolated to specific layers - Scalability: Can optimize individual layers (cache Layer 2, parallelize Layer 3) - Flexibility: Can swap implementations (CSV → Database at Layer 1)

8.1.2 Request Flow Example

User Request: "Generate report cards for all 5th grade students"

Layer 5 (Presentation):
  ↓ User selects "Report Card" document type
  ↓ Selects filter: grade_level = 5
  ↓ Clicks "Generate"

Layer 4 (Application Logic):
  ↓ Validates user permissions (can see 5th graders?)
  ↓ Logs request (who, what, when)
  ↓ Determines: Batch operation (multiple students)
  ↓ Calls Layer 3 for each student

Layer 3 (Document Generation):
  ↓ Loads template: reportCard.docx
  ↓ For each student, calls Layer 2 for data
  ↓ Merges data into template
  ↓ Generates PDF output
  ↓ Returns generated documents

Layer 2 (Data Layer):
  ↓ Receives request: buildReportCardData(student_id, semester_id)
  ↓ Queries Layer 1 for entities
  ↓ Joins: Student → Enrollment → Class → Grade
  ↓ Calculates: GPA, attendance percentage
  ↓ Returns structured data object

Layer 1 (Data Sources):
  ↓ Reads CSV files or queries database
  ↓ Returns raw entity data
  ↓ Validates data integrity

Result: - 15 report cards generated (one per 5th grade student) - Each as PDF file - Packaged in ZIP for download - Total time: 30 seconds

8.2 Data Layer Architecture

The Data Layer implements the domain ontology as executable code.

8.2.1 Entity Models

Each entity becomes a software object with attributes, relationships, and methods.

Example: Student Entity

class Student {
    // Core Attributes
    constructor(data) {
        this.student_id = data.student_id;
        this.first_name = data.first_name;
        this.last_name = data.last_name;
        this.grade_level = data.grade_level;
        this.birth_date = new Date(data.birth_date);
        this.status = data.status || 'Active';
        this.photo = data.photo;
        this.allergies = data.allergies;
    }

    // Computed Properties
    get full_name() {
        return `${this.first_name} ${this.last_name}`;
    }

    get age() {
        const today = new Date();
        const age = today.getFullYear() - this.birth_date.getFullYear();
        return age;
    }

    get is_active() {
        return this.status === 'Active';
    }

    // Relationships (loaded on demand)
    async getParents() {
        // Query parent_student table and parent table
        return await ParentStudentRepository.findParentsByStudent(this.student_id);
    }

    async getEnrollments(semester_id) {
        return await EnrollmentRepository.findByStudent(this.student_id, semester_id);
    }

    async getGrades(semester_id) {
        const enrollments = await this.getEnrollments(semester_id);
        const grades = [];
        for (const enrollment of enrollments) {
            const enrollmentGrades = await GradeRepository.findByEnrollment(enrollment.enrollment_id);
            grades.push(...enrollmentGrades);
        }
        return grades;
    }

    // Validation
    validate() {
        const errors = [];

        if (!this.student_id) errors.push("student_id is required");
        if (!this.first_name) errors.push("first_name is required");
        if (!this.last_name) errors.push("last_name is required");
        if (this.grade_level < 0 || this.grade_level > 12) {
            errors.push("grade_level must be between 0 and 12");
        }
        if (this.birth_date > new Date()) {
            errors.push("birth_date cannot be in the future");
        }

        return errors;
    }
}

Key Principles:

  1. Rich Domain Models: Objects aren't just data bags - they have behavior
  2. Computed Properties: Derived values calculated on demand (age, full_name)
  3. Lazy Loading: Relationships loaded when needed, not upfront
  4. Validation Built-In: Each entity validates itself
  5. Encapsulation: Internal details hidden from callers

8.2.2 Repository Pattern

Repositories handle data access, keeping database/CSV logic separate from entities.

class StudentRepository {
    constructor(dataSource) {
        this.dataSource = dataSource; // Could be CSV, SQL, API
    }

    async findById(student_id) {
        const data = await this.dataSource.query(
            'students',
            { student_id: student_id }
        );

        if (data.length === 0) return null;
        return new Student(data[0]);
    }

    async findByGradeLevel(grade_level, status = 'Active') {
        const data = await this.dataSource.query(
            'students',
            { grade_level: grade_level, status: status }
        );

        return data.map(row => new Student(row));
    }

    async findAll(filters = {}) {
        const data = await this.dataSource.query('students', filters);
        return data.map(row => new Student(row));
    }

    async save(student) {
        const errors = student.validate();
        if (errors.length > 0) {
            throw new ValidationError(errors);
        }

        await this.dataSource.upsert('students', student);
        return student;
    }
}

Benefits: - Abstraction: Callers don't know if data comes from CSV, SQL, or API - Testability: Can mock repository for unit tests - Swappability: Change data source without changing business logic - Caching: Repository can cache frequently-accessed data

8.2.3 Relationship Resolution

The most complex part: joining related entities correctly.

Strategy 1: Eager Loading (Load everything upfront)

async function buildReportCardData_Eager(student_id, semester_id) {
    // Load all related data in parallel
    const [student, enrollments, grades, attendance] = await Promise.all([
        StudentRepository.findById(student_id),
        EnrollmentRepository.findByStudent(student_id, semester_id),
        GradeRepository.findByStudent(student_id, semester_id),
        AttendanceRepository.findByStudent(student_id, semester_id)
    ]);

    // Now join in memory
    const classes = [];
    for (const enrollment of enrollments) {
        const classInfo = await ClassRepository.findById(enrollment.class_id);
        const classGrades = grades.filter(g => g.enrollment_id === enrollment.enrollment_id);

        classes.push({
            class: classInfo,
            enrollment: enrollment,
            grades: classGrades
        });
    }

    return { student, classes, attendance };
}

Pros: Fast (parallel queries), complete data Cons: May load data not needed, memory intensive

Strategy 2: Lazy Loading (Load on demand)

async function buildReportCardData_Lazy(student_id, semester_id) {
    const student = await StudentRepository.findById(student_id);

    // Enrollments loaded when accessed
    student.enrollments = async function() {
        return await EnrollmentRepository.findByStudent(student_id, semester_id);
    };

    return student; // Caller triggers additional loads as needed
}

Pros: Memory efficient, only load what's needed Cons: Many small queries (N+1 problem), slower

Strategy 3: Graph Loading (Specify what to include - Recommended)

async function buildReportCardData_Graph(student_id, semester_id, includes) {
    const student = await StudentRepository.findById(student_id);
    const data = { student };

    if (includes.enrollments) {
        data.enrollments = await EnrollmentRepository.findByStudent(student_id, semester_id);

        if (includes.classes) {
            for (const enrollment of data.enrollments) {
                enrollment.class = await ClassRepository.findById(enrollment.class_id);
            }
        }

        if (includes.grades) {
            for (const enrollment of data.enrollments) {
                enrollment.grades = await GradeRepository.findByEnrollment(enrollment.enrollment_id);
            }
        }
    }

    if (includes.attendance) {
        data.attendance = await AttendanceRepository.findByStudent(student_id, semester_id);
    }

    return data;
}

// Usage:
const data = await buildReportCardData_Graph(student_id, semester_id, {
    enrollments: true,
    classes: true,
    grades: true,
    attendance: true
});

Pros: Explicit control, efficient Cons: Caller must know what to request

Recommendation: Use Strategy 3 (Graph Loading) - gives control and efficiency.

8.3 Document Generation Engine

The engine merges data with templates to produce documents.

8.3.1 Template Processing Pipeline

Template File (reportCard.docx)
         ↓
   [1. LOAD TEMPLATE]
   Parse template structure
   Identify merge fields {{field}}
   Identify loops {{#each}}...{{/each}}
   Identify conditionals {{#if}}...{{/if}}
         ↓
   [2. DATA PREPARATION]
   Fetch data from Data Layer
   Transform to template-friendly structure
   Pre-calculate computed values
         ↓
   [3. FIELD SUBSTITUTION]
   Replace {{student.name}} with "Emma Anderson"
   Format dates, numbers per locale
   Handle missing values (show empty or default)
         ↓
   [4. CONTROL FLOW]
   Execute loops (repeat for each class)
   Execute conditionals (show if honor roll)
   Handle nested structures
         ↓
   [5. FINALIZATION]
   Apply styling and formatting
   Generate output format (DOCX → PDF)
   Clean up temporary files
         ↓
   Output Document (reportCard_S001.pdf)

8.3.2 Template Syntax Design

Design Goals: - Readable: Non-programmers can understand - Powerful: Support loops, conditionals, formatting - Safe: Prevent code injection, limit complexity - Debuggable: Clear error messages

Common Syntax Styles:

1. Mustache-style ({{field}})

Student: {{student.full_name}}
Grade: {{student.grade_level}}

Classes:
{{#each classes}}
  - {{title}}: {{final_grade}}
{{/each}}

Pros: Clean, widely supported Cons: Limited functionality

2. Jinja-style ({% block %} {{ var }})

Student: {{ student.full_name }}
Grade: {{ student.grade_level }}

{% for class in classes %}
  - {{ class.title }}: {{ class.final_grade }}
{% endfor %}

{% if gpa >= 3.5 %}
HONOR ROLL
{% endif %}

Pros: More powerful, familiar to Python developers Cons: More complex syntax

Recommendation: Start with Mustache (simple), add Jinja features as needed (power).

8.4 Validation Architecture

Validation prevents bad outputs. Three layers of defense:

8.4.1 Layer 1: Schema Validation

Validates raw input data structure.

const Joi = require('joi');

const studentSchema = Joi.object({
    student_id: Joi.string().required(),
    first_name: Joi.string().required(),
    last_name: Joi.string().required(),
    grade_level: Joi.number().integer().min(0).max(12).required(),
    birth_date: Joi.date().max('now').required(),
    status: Joi.string().valid('Active', 'Inactive', 'Alumni').default('Active'),
    photo: Joi.string().uri().optional(),
    allergies: Joi.string().optional()
});

function validateStudentData(data) {
    const { error, value } = studentSchema.validate(data, { abortEarly: false });

    if (error) {
        const errors = error.details.map(detail => ({
            field: detail.path.join('.'),
            message: detail.message,
            value: detail.context.value
        }));

        throw new ValidationError('Schema validation failed', errors);
    }

    return value; // Validated and coerced data
}

Benefits: - Catches data type errors (string where number expected) - Enforces required fields - Validates ranges and formats - Provides clear error messages

8.4.2 Layer 2: Referential Integrity

Validates relationships between entities.

class ReferentialIntegrityValidator {
    constructor(repositories) {
        this.repos = repositories;
    }

    async validateEnrollment(enrollment) {
        const errors = [];

        // Check student exists
        const student = await this.repos.student.findById(enrollment.student_id);
        if (!student) {
            errors.push({
                entity: 'Enrollment',
                id: enrollment.enrollment_id,
                field: 'student_id',
                message: `Student ${enrollment.student_id} not found`,
                suggestion: 'Verify student_id or add student to students.csv'
            });
        }

        // Check class exists
        const classInfo = await this.repos.class.findById(enrollment.class_id);
        if (!classInfo) {
            errors.push({
                entity: 'Enrollment',
                id: enrollment.enrollment_id,
                field: 'class_id',
                message: `Class ${enrollment.class_id} not found`,
                suggestion: 'Verify class_id or add class to classes.csv'
            });
        }

        return errors;
    }
}

8.4.3 Layer 3: Business Rules

Validates domain-specific logic.

class BusinessRuleValidator {
    async validateReportCardGeneration(student_id, semester_id) {
        const errors = [];
        const warnings = [];

        const student = await StudentRepository.findById(student_id);
        const enrollments = await EnrollmentRepository.findByStudent(student_id, semester_id);

        // Rule: Student must be active
        if (student.status !== 'Active') {
            warnings.push({
                rule: 'active_student',
                message: `Student ${student.full_name} is ${student.status}`,
                severity: 'warning'
            });
        }

        // Rule: Student must have at least one enrollment
        if (enrollments.length === 0) {
            errors.push({
                rule: 'has_enrollments',
                message: `Student ${student.full_name} has no classes this semester`,
                severity: 'error',
                suggestion: 'Add enrollments or skip this student'
            });
        }

        // Rule: All enrollments must have final grades
        for (const enrollment of enrollments) {
            if (!enrollment.final_grade) {
                errors.push({
                    rule: 'grades_complete',
                    message: `No final grade for ${student.full_name} in ${enrollment.class.title}`,
                    severity: 'error',
                    suggestion: 'Enter final grade or mark as incomplete'
                });
            }
        }

        return { errors, warnings };
    }
}

8.5 Batch Processing Architecture

Generating hundreds of documents requires careful orchestration.

8.5.1 Batch Strategies

Strategy 1: Sequential Processing

async function generateBatch_Sequential(documentType, entityIds) {
    const results = [];

    for (const id of entityIds) {
        try {
            const document = await generateDocument(documentType, id);
            results.push({ id, status: 'success', document });
        } catch (error) {
            results.push({ id, status: 'error', error: error.message });
        }
    }

    return results;
}

Pros: Simple, easy to debug, predictable memory usage Cons: Slow (no parallelism)

Strategy 2: Chunked Parallel (Recommended)

async function generateBatch_Chunked(documentType, entityIds, chunkSize = 10) {
    const results = [];

    for (let i = 0; i < entityIds.length; i += chunkSize) {
        const chunk = entityIds.slice(i, i + chunkSize);

        const chunkResults = await Promise.all(
            chunk.map(id =>
                generateDocument(documentType, id)
                    .then(doc => ({ id, status: 'success', document: doc }))
                    .catch(error => ({ id, status: 'error', error: error.message }))
            )
        );

        results.push(...chunkResults);

        // Report progress
        const progress = Math.min(i + chunkSize, entityIds.length);
        reportProgress(progress, entityIds.length);
    }

    return results;
}

Pros: Balanced (speed + memory), progress tracking Cons: Slightly more complex

8.5.2 Progress Tracking

Users need feedback during long operations:

class ProgressTracker {
    constructor(total) {
        this.total = total;
        this.completed = 0;
        this.startTime = Date.now();
        this.listeners = [];
    }

    increment() {
        this.completed++;
        this.notifyListeners();
    }

    getProgress() {
        const percentage = (this.completed / this.total) * 100;
        const elapsed = Date.now() - this.startTime;
        const rate = this.completed / (elapsed / 1000); // per second
        const remaining = (this.total - this.completed) / rate; // seconds

        return {
            completed: this.completed,
            total: this.total,
            percentage: percentage.toFixed(1),
            elapsed: Math.round(elapsed / 1000),
            estimatedRemaining: Math.round(remaining)
        };
    }

    onProgress(callback) {
        this.listeners.push(callback);
    }

    notifyListeners() {
        const progress = this.getProgress();
        this.listeners.forEach(listener => listener(progress));
    }
}

8.6 Chapter Summary

This chapter provided comprehensive architectural guidance for implementing domain-specific document automation systems:

Five-Layer Architecture: Clear separation of concerns (Presentation, Application, Generation, Data, Sources)

Data Layer: Rich domain models, repository pattern, relationship resolution strategies (eager, lazy, graph loading)

Document Generation: Template processing pipeline, syntax design (Mustache/Jinja), output format generation

Validation: Three layers (Schema, Referential Integrity, Business Rules) with user-friendly error reporting

Batch Processing: Sequential vs. chunked parallel strategies, progress tracking, partial success handling

Key Takeaways: - Architecture must be layered and modular - Validation is critical - fail fast with clear messages - Batch processing requires careful strategy (chunked parallel recommended) - User experience matters - progress tracking, partial success - Plan for scale from beginning


Further Reading

On Software Architecture: - Bass, Len, et al. Software Architecture in Practice, 4th Edition. Addison-Wesley, 2021. (Comprehensive architecture guide) - Richards, Mark, and Neal Ford. Fundamentals of Software Architecture. O'Reilly, 2020. (Modern architecture patterns) - Martin, Robert C. Clean Architecture. Prentice Hall, 2017. (Dependency rule and boundaries)

On Microservices: - Newman, Sam. Building Microservices, 2nd Edition. O'Reilly, 2021. (Definitive guide to microservices) - Richardson, Chris. Microservices Patterns. Manning, 2018. (Pattern catalog) - "Microservices." Martin Fowler. https://martinfowler.com/articles/microservices.html (Foundational article)

On API Design: - Masse, Mark. REST API Design Rulebook. O'Reilly, 2011. - "API Design Guide." Google Cloud. https://cloud.google.com/apis/design (Google's API standards) - OpenAPI Specification: https://swagger.io/specification/ (Standard for REST APIs) - GraphQL Documentation: https://graphql.org/learn/ (Alternative to REST)

On Database Design: - Karwin, Bill. SQL Antipatterns. Pragmatic Bookshelf, 2010. (Common database mistakes) - Kleppmann, Martin. Designing Data-Intensive Applications. O'Reilly, 2017. (Modern data systems) - Fowler, Martin. "Patterns of Enterprise Application Architecture." Addison-Wesley, 2002. (Data patterns)

On Document Generation Engines: - Apache POI: https://poi.apache.org/ (Java library for Office documents) - Docxtemplater: https://docxtemplater.com/ (JavaScript Word template engine) - python-docx: https://python-docx.readthedocs.io/ (Python Word library) - Pandoc: https://pandoc.org/ (Universal document converter) - Aspose.Words: https://products.aspose.com/words/ (Commercial document API)

On Template Engines: - Jinja2: https://jinja.palletsprojects.com/ (Python template engine) - Handlebars: https://handlebarsjs.com/ (JavaScript template engine) - Liquid: https://shopify.github.io/liquid/ (Ruby template engine)

Related Patterns in This Trilogy: - Volume 2, Pattern 27 (Event Sourcing): Architectural pattern for event-driven systems - Volume 2, Pattern 28 (CQRS): Command Query Responsibility Segregation - Volume 2, Pattern 30 (Scalability Patterns): Scaling architecture - Volume 2, Pattern 32 (System Integration): Integration patterns

Cloud Platforms: - AWS Lambda: https://aws.amazon.com/lambda/ (Serverless document generation) - Azure Functions: https://azure.microsoft.com/en-us/services/functions/ (Serverless alternative) - Google Cloud Functions: https://cloud.google.com/functions (Another serverless option)