Volume 3: Human-System Collaboration

Chapter 4: Domain Knowledge Encoding


Opening: The Registration Form That Taught Nothing

Maria had been volunteering as homeschool co-op coordinator for three weeks when she decided to "improve" the registration process. The previous coordinator, Janet, had used a complex paper form with handwritten notes and frequent phone calls. Maria, who worked in tech, thought she could streamline it.

She created a Google Form:

Student Name: _______________
Grade Level: _______________
Parent Name: _______________
Email: _______________
Phone: _______________
Emergency Contact: _______________

Clean. Simple. Digital. She sent it out feeling accomplished.

The responses started coming in:

Student Name: "Emma"
Grade Level: "5"
Parent Name: "Johnson"
Email: "emma@gmail.com"
Phone: "555-1234"
Emergency Contact: "Dad"

Maria stared at the responses, confused. Was Emma in 5th grade or 5 years old? Was Johnson the first name or last name? Whose email was this—parent or student? What kind of phone number has only 7 digits? Who is "Dad" and how do you reach him?

She called Emma's parent. "I'm sorry, I need more information. Is Emma in 5th grade?"

"Oh no, she's five years old. PreK. I thought that's what you meant by grade level."

"And I need a complete phone number with area code..."

"Oh, sorry, I assumed you were local. It's 412-555-1234."

"And for emergency contact, I need a name and number..."

"Right, that would be her father, Mike Johnson, same number."

After 20 calls like this, Maria called Janet in frustration. "How did you get complete information with that paper form?"

Janet laughed. "The form wasn't doing the work, honey. I was. Let me show you what I actually need to know..."

Janet pulled out her worn binder, filled with annotations and sticky notes. "See, when a parent tells me their child is 'five,' I need to know: Is that chronological age or developmental age? Because we have five-year-olds in PreK and advanced five-year-olds in Kindergarten. And are they turning six this year? That affects placement."

"And phone numbers—I need to know: Is this a cell or landline? Because if there's an emergency and you're not answering your cell, I need to know if I should try your home or workplace. And which parent is primary contact? Some families want calls to go to mom, others to dad, others to both."

"And emergency contacts—I need someone who's not the parents and who lives close enough to actually pick up the kid. And I need their relationship documented because the school needs to know who has permission to collect children."

"And medical stuff—allergies, medications, conditions I need to watch for. And behavior stuff—does this kid have sensory issues? Anxiety? Do they need extra transitions time? Are there custody situations I need to know about?"

Maria felt overwhelmed. "How do you capture all that?"

Janet smiled. "The form isn't just asking questions. It's teaching parents what matters. Look..."

She showed Maria her paper form. Every field had examples. Every section had explanatory text. Checkboxes for common situations. Space for free-text notes. Questions grouped by purpose, not alphabetically.

But more than that—the form anticipated. It asked about birthday and grade level, then showed expected grade placement so parents could confirm or explain differences. It asked about allergies with specific categories because "just food allergies" misses bee stings and medication reactions. It explained why certain information was needed: "We ask about custody arrangements not to pry, but because we need to know who has permission to pick up your child and whether there are any safety concerns."

"Your digital form," Janet said gently, "asked for data. Mine asks for knowledge. There's a difference."

The Knowledge-Data Distinction

Maria's form collected data points. Janet's form captured domain knowledge. What's the difference?

Data is decontextualized facts: "Emma," "5," "Johnson," "555-1234"

Knowledge is facts with meaning: "Emma Johnson, age 5, birthdate 3/15/2020, should be in Kindergarten based on age but parents are placing her in PreK due to social-emotional readiness, will advance to Kindergarten next year, primary contact is mother Jessica Johnson (cell: 412-555-1234, work: 412-555-9876), father Mike Johnson is emergency contact (412-555-1234 same), child has peanut allergy (EpiPen in office), no other medical conditions, pickup authorized only for parents unless prior written notification provided."

Janet's form didn't just ask better questions. It encoded 15 years of experience about what a coordinator needs to know and why. Every field, every validation rule, every help text, every default value represented a lesson learned—often the hard way.

This is what we mean by domain knowledge encoding: The form becomes a vessel for expertise, teaching both the user filling it out and the system processing it what matters in this domain.

How Validation Rules Teach

Let's rebuild Maria's form with Janet's expertise encoded.

Grade Level Field: From Ambiguous to Instructive

Maria's version:

Grade Level: _______________

Parents entered: "5", "five", "5th", "kindergarten", "K", "pre-k", "preschool", "young 5s"

Janet's encoded version:

Child's Birth Date: [MM/DD/YYYY]
[Calendar picker]

Based on birthdate, typical grade placement: [Auto-calculated]
  "Kindergarten (age 5, birthdate after 9/1/2019)"

Are you enrolling in this grade?
  ( ) Yes, typical placement
  ( ) No, different grade (explain below)

[If No selected]
Which grade are you requesting? [Dropdown: PreK, K, 1st, 2nd...]
Reason for different placement: [Text area]
  "Helps us ensure appropriate class assignment"

Look what this does:

  1. Removes ambiguity: Birth date is unambiguous. "Grade level" meant different things to different people.

  2. Teaches domain rules: Parents learn that placement is normally age-based with a cutoff date. The form makes the standard visible.

  3. Allows exceptions gracefully: Doesn't force everyone into age-based boxes, but asks for explanation when deviating.

  4. Captures reasoning: When a parent places a 6-year-old in Kindergarten, the coordinator learns why—might be "late birthday, first-time schooling" or "developmental delays" or "language barrier." Each requires different support.

  5. Prevents downstream errors: The system now knows this is a conscious choice, not a data entry error.

The validation rule isn't just checking format. It's teaching the parent about placement norms while capturing their specific context.

Phone Number: From Format to Function

Maria's version:

Phone: _______________

Parents entered everything from "555-1234" to "412.555.1234" to "(412) 555-1234 ext 234" to "cell: 555-1234, work: 555-5678"

Janet's encoded version:

Primary Parent Name: _______________
Best number to reach you: _______________
[Auto-formats as (XXX) XXX-XXXX as user types]
[Validates: US phone format, not toll-free, not test number]

Is this: ( ) Cell  ( ) Home  ( ) Work

If we can't reach you at this number, should we:
  [ ] Try this alternate number: _______________
  [ ] Text me at the number above
  [ ] Call my workplace: _______________ ext: ___
  [ ] Call emergency contact immediately

Now the system knows:

  1. Context: Cell vs landline affects communication strategy. Can we text? Will voicemail work?

  2. Backup plan: Not "here's a random second number" but "here's what to do if primary fails."

  3. Preferences: Some parents want workplace calls, others never. Form captures this explicitly.

  4. Emergency protocol: Establishes clear escalation path.

The validation ensures format, yes. But the structure teaches parents to think through communication scenarios: "What if there's an incident and we can't reach you?"

Allergies: From Text Box to Safety Protocol

Maria's version:

Allergies: _______________

Parents entered: "none," "n/a," "peanuts," "food," "seasonal," "lactose," "bee stings but not serious," "environmental"

Janet's encoded version:

Does your child have any allergies? ( ) Yes  ( ) No

[If Yes]
Please check all that apply:
[ ] Food allergies (specify below)
[ ] Insect sting allergies (specify below)
[ ] Medication allergies (specify below)
[ ] Environmental allergies (seasonal, dust, etc.)
[ ] Latex
[ ] Other (specify below)

For each checked category:
Specific allergen: _______________
Reaction severity: ( ) Mild  ( ) Moderate  ( ) Severe/Anaphylactic
Treatment required: 
  [ ] Antihistamine (e.g., Benadryl)
  [ ] Inhaler
  [ ] EpiPen
  [ ] Other: _______________
  [ ] Call 911 immediately

Location of medication (if applicable):
  ( ) Student carries it
  ( ) Parent provides to office (labeled with student name)
  ( ) No medication needed

Action plan on file? ( ) Yes  ( ) No
[If No] "Please provide doctor's allergy action plan by first day"

This isn't paranoia. This is encoding hard-won knowledge:

  • "Peanuts" alone doesn't tell you severity. Is this "gets hives" or "throat closes immediately"?
  • "Food allergies" without specifics is useless when snacks are distributed.
  • Parents saying "lactose" often mean intolerance, not allergy—matters for emergency response.
  • "Environmental" allergies usually don't require EpiPens, but the form lets you distinguish.
  • Knowing WHERE the EpiPen is located can save critical seconds.

The form structure teaches parents: "These are the distinctions that matter for your child's safety."

Emergency Contact: From Name to Protocol

Maria's version:

Emergency Contact: _______________

Parents entered: "Dad," "Grandma," "555-5678," "Mike Johnson"

Janet's encoded version:

If we cannot reach you, who should we call?

Emergency Contact #1:
  Full name: _______________
  Relationship to child: _______________
  Phone: _______________ Type: ( ) Cell ( ) Home ( ) Work
  Can pick up child? ( ) Yes  ( ) No
  Lives within: ( ) 10 min  ( ) 30 min  ( ) 1+ hour

Emergency Contact #2:
  [Same fields]

Are there any adults who should NOT be allowed to pick up your child?
  ( ) No restrictions
  ( ) Yes (please explain and provide photo if possible)

Explanation: [Text area]
  "This information is confidential and used only for child safety."

Now the coordinator knows:

  1. Who to call and how: Full contact information, not guessing.

  2. Capability: Can they actually pick up the child if needed? Someone an hour away isn't useful for "kid threw up, come get them."

  3. Authority: Explicitly authorized for pickup, not just "a name."

  4. Restrictions: Custody issues, estranged relatives, safety concerns—captured explicitly, not discovered mid-crisis.

The form teaches parents that "emergency contact" isn't just "another phone number"—it's someone with authority, capability, and proximity to act on your child's behalf.

Extracting the Principles

From these examples, we can extract general principles about how domain knowledge gets encoded into forms:

Principle 1: Anticipate Ambiguity

What seems obvious to you may be ambiguous to users. "Grade level" seems clear until you realize some mean age, some mean academic placement, some mean developmental stage.

Encoding strategy: - Use precise terminology - Provide examples - Show expected vs actual when they might differ - Ask for underlying facts (birthdate) rather than derived interpretations (grade)

Principle 2: Make Implicit Rules Explicit

Domain experts know rules that users don't. Age cutoffs for grade placement. Standard allergy categories. What constitutes an "emergency contact."

Encoding strategy: - Show the rule: "Typically, 5-year-olds are in Kindergarten" - Allow exceptions: "But you can choose differently" - Capture reasoning: "Tell us why, so we can support your child"

Principle 3: Distinguish Severity and Context

"Allergies" ranges from "sniffles around cats" to "will die if exposed to peanuts." The form must capture this range.

Encoding strategy: - Use severity scales where appropriate - Ask about consequences: "What happens if exposed?" - Ask about treatment: "What do we do?" - Ask about immediacy: "How urgent is this?"

Principle 4: Capture Relationships, Not Just Facts

An emergency contact isn't just a name and number. It's a person with a relationship, capabilities, limitations, and authority.

Encoding strategy: - Ask about connections: "Relationship to child?" - Ask about capability: "Can they pick up?" - Ask about proximity: "How far away?" - Ask about authority: "Are they authorized?"

Principle 5: Explain Why You're Asking

Users resist invasive questions unless they understand the reason.

Encoding strategy: - Add explanatory text: "We ask about custody situations to ensure child safety" - Show consequence: "This helps us know who can pick up your child" - Offer opt-out where appropriate: "If you prefer to discuss this by phone..."

Principle 6: Teach Through Structure

The way you organize questions teaches users how to think about the domain.

Encoding strategy: - Group related questions: All allergy info together - Use logical progression: Basic info → Details → Special circumstances - Show dependencies: "If you answered yes to X, we need to know Y" - Make workflows visible: "First we'll... then we'll... finally..."


Now let's see how these principles apply across different domains...

Real Estate: Encoding Market Knowledge

Consider a property listing form. A novice might create:

Address: _______________
Price: _______________
Bedrooms: _______________
Bathrooms: _______________
Square Feet: _______________
Description: _______________

But an experienced real estate agent knows that buyers don't think in these terms. They think: "Can I afford this?" "Does it fit my family?" "Is it in a good school district?"

Encoded version:

Property Address: _______________
[Auto-looks up: Neighborhood, School District, Tax Rate, Recent Sales]

You entered: 123 Main St, Pittsburgh PA 15215
This is in: Northside neighborhood, Pittsburgh Public Schools District 1
Recent comparable sales: $180K-$220K for similar properties

Listing Price: $ _______________
[System shows: "This is [X%] above/below recent comparables"]

Property Type:
  ( ) Single Family  ( ) Townhouse  ( ) Condo  ( ) Multi-Family

[If Multi-Family] Number of units: ___

Interior Space:
  Bedrooms: ___ (Excluding office/den)
  Full Bathrooms: ___ (Tub or shower)
  Half Bathrooms: ___ (Toilet + sink only)
  Square Footage: _____ (finished living space)

[Validation: "This seems high for a 3-bedroom. Typical range is 1,200-1,800 sq ft. Did you include basement/garage?"]

Basement: ( ) None  ( ) Unfinished  ( ) Partially finished  ( ) Finished
[If finished] Additional sq ft: _____

Key Features (select all that apply):
  [ ] Updated kitchen (within 5 years)
  [ ] Updated bathrooms (within 5 years)
  [ ] New roof (within 10 years)
  [ ] New HVAC (within 10 years)
  [ ] Hardwood floors
  [ ] Fireplace
  [ ] Garage (attached/detached)
  [ ] Central A/C
  [ ] Fenced yard

Property Condition:
  ( ) Move-in ready
  ( ) Minor cosmetic updates needed
  ( ) Significant updates needed
  ( ) Investor/flip opportunity

[Based on selection, system suggests appropriate language for listing]

What makes this property special? [Text area]
"Focus on unique features, recent improvements, and lifestyle benefits"

Look what this encodes:

Market knowledge: System shows comparable sales, so agent can price competitively.

Buyer thinking: Questions organized around what buyers care about—not just specs but condition, features, updates.

Terminology precision: "Full bathroom" vs "half bathroom" is meaningful. "Finished" vs "unfinished" basement affects square footage calculations and pricing.

Quality signals: Recent updates (roof, HVAC, kitchen) are worth calling out separately because they affect buyer perception and financing.

Validation against norms: If someone enters 3,500 sq ft for a 3-bedroom, that's suspicious. Either it's a data entry error or this is an unusually large home that needs explanation.

Professional guidance: Prompts for description focus agent on what matters: uniqueness, improvements, lifestyle—not just repeating the specs.

A law firm's client intake form from a novice might look like:

Name: _______________
Legal Issue: _______________
Preferred Contact: _______________

An experienced attorney knows they need to capture: - Conflict checking information - Statute of limitations urgency - Proper case categorization - Jurisdiction issues - Opposing party details - Timeline of events

Encoded version:

Your Full Legal Name: _______________
Other names you've used: _______________
"Include maiden name, previous married names, aliases"
[Why: Conflict checking requires all names]

What brings you to our firm? [Text area]
"Brief description of your legal situation"

Based on your description, this seems like:
[System suggests: ( ) Family Law  ( ) Personal Injury  ( ) Business  ( ) Estate  ( ) Other]

Is this correct? ( ) Yes  ( ) No, it's actually: _______________

[If Family Law selected]
Type of case:
  ( ) Divorce/Separation
  ( ) Child Custody
  ( ) Adoption
  ( ) Protection from Abuse
  ( ) Support/Alimony
  ( ) Other: _______________

Who is the other party?
Full name: _______________
Address (if known): _______________
Attorney (if represented): _______________

IMPORTANT: Have you received any court papers?
  ( ) No
  ( ) Yes - when? _______________ [Date picker]

[If Yes] "URGENT: Please upload or bring all court papers to your consultation. Response deadlines may be very short."

[System calculates days remaining, flags urgent cases]

When did the key events occur?
Most recent: _______________ [Date]
Started: _______________ [Date]

[System checks statute of limitations, flags potential issues]

Have you previously worked with our firm?
  ( ) No
  ( ) Yes - Case name: _______________

Have any family members worked with our firm?
  ( ) No
  ( ) Yes - Who: _______________

Do you know if the other party has worked with our firm?
  ( ) No
  ( ) Yes
  ( ) Unsure

[All of this feeds conflict checking]

Best way to contact you:
  ( ) Phone  ( ) Email  ( ) Text  ( ) Mail only

Is it safe to:
  [ ] Leave voicemails
  [ ] Send mail to your home address
  [ ] Send emails

[Why: Domestic violence cases require discrete communication]

How did you hear about us?
  ( ) Referral from: _______________
  ( ) Online search
  ( ) Advertisement
  ( ) Previous client

This encodes:

Conflict checking protocol: All name variations, family connections, other party details—everything needed to identify potential conflicts before the first meeting.

Urgency assessment: Court papers trigger immediate timeline checking. System calculates response deadlines automatically.

Statute of limitations awareness: Event dates feed automatic checks against limitation periods for different case types.

Safety considerations: Domestic violence victims can't have mail sent home. The form captures communication restrictions explicitly.

Case categorization: System helps clients identify case type, which determines which attorney they're routed to and what initial documents to prepare.

Relationship mapping: Family connections, referral sources, previous engagements—all tracked for both business development and conflict checking.

The form doesn't just collect contact info. It encodes legal procedural knowledge, ethical obligations, and safety protocols.

Conclusion: Forms as Knowledge Artifacts

Maria's simple form asked for data. Janet's form encoded 15 years of hard-won expertise about what a homeschool co-op coordinator needs to know and why.

This is the fundamental insight: Forms are not just user interfaces. They are knowledge artifacts.

Every field is a claim about what matters. Every validation rule is a lesson about domain constraints. Every help text is a transfer of expertise. Every default value is institutional memory. Every conditional branch is domain logic made explicit.

When you design a form, you're not just asking questions. You're encoding your understanding of the domain into an executable structure. You're teaching users how experts think about these problems. You're preserving organizational knowledge in a form that outlives any individual expert.

This is why form design matters far more than most organizations realize. A bad form doesn't just frustrate users—it loses knowledge. A good form doesn't just collect data—it teaches, guides, and preserves expertise.

The patterns in Part II will show you how to encode domain knowledge effectively across different contexts and requirements.

But first, we need to understand how this volume connects to the previous two—how knowledge capture feeds intelligence and output.


Further Reading

Academic Foundations

Domain Modeling: - Evans, E. (2003). Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison-Wesley. - Ubiquitous language, bounded contexts, domain entities - Essential framework for encoding domain knowledge in systems - https://www.domainlanguage.com/ddd/ - Vernon, V. (2013). Implementing Domain-Driven Design. Addison-Wesley. - Practical patterns for domain modeling - Aggregates, value objects, domain events

Knowledge Engineering: - Schreiber, G., et al. (1999). Knowledge Engineering and Management: The CommonKADS Methodology. MIT Press. - Systematic approach to knowledge engineering - Modeling expertise and problem-solving methods - Studer, R., Benjamins, V. R., & Fensel, D. (1998). "Knowledge engineering: Principles and methods." Data & Knowledge Engineering, 25(1-2), 161-197. - Overview of knowledge representation approaches - https://doi.org/10.1016/S0169-023X(97)00056-6

Ontology Engineering: - Gruber, T. R. (1993). "A translation approach to portable ontology specifications." Knowledge Acquisition, 5(2), 199-220. - Defining shared conceptualizations for domain knowledge - https://doi.org/10.1006/knac.1993.1008 - Noy, N. F., & McGuinness, D. L. (2001). "Ontology Development 101: A Guide to Creating Your First Ontology." Stanford Knowledge Systems Laboratory. - Practical guide to building domain ontologies - https://protege.stanford.edu/publications/ontology_development/ontology101.pdf

Semantic Web: - Berners-Lee, T., Hendler, J., & Lassila, O. (2001). "The Semantic Web." Scientific American, 284(5), 34-43. - Machine-readable domain knowledge - https://doi.org/10.1038/scientificamerican0501-34

Volume 1: Document Generation Foundations - Chapter 3: "Document Ontology - A Formal Framework" - Systematic domain knowledge structures - Chapter 4: "Vertical Document Domains" - Domain analysis methodology - Chapter 5: "Educational Domain Deep Dive" - Complete case study of domain implementation - Chapter 6: "Business Domain Patterns" - Legal, Real Estate, Retail, HR domain examples - Chapter 7: "Cross-Domain Analysis" - Universal principles vs domain-specific factors - Chapter 10: "Domain Knowledge Acquisition" - Systematically capturing expert knowledge

Volume 2: Organizational Intelligence - (Volume 2 focuses on behavioral patterns and predictions - minimal direct relevance to domain knowledge encoding)

Volume 3 Continuation: - Chapter 5: "Connecting the Trilogy" - How domain knowledge flows through the system - Volume 3, Pattern 6: "Domain-Aware Validation" - Implementing domain constraints - Part IV: "Domain Applications" - Specific domain implementations

Implementation Frameworks

Domain Modeling Tools: - Protégé: https://protege.stanford.edu/ - Open-source ontology editor and knowledge management system - TopBraid Composer: https://www.topquadrant.com/products/topbraid-composer/ - Semantic web modeling and ontology development - WebProtégé: https://webprotege.stanford.edu/ - Collaborative ontology development platform

Schema Standards: - Schema.org: https://schema.org/ - Shared vocabulary for structured data on the web - Dublin Core: https://www.dublincore.org/ - Metadata standard for resource description - FHIR (Healthcare): https://www.hl7.org/fhir/ - Fast Healthcare Interoperability Resources

Domain-Specific Languages: - Fowler, M. (2010). Domain-Specific Languages. Addison-Wesley. - Creating languages tailored to specific domains - Voelter, M., et al. (2013). DSL Engineering. dslbook.org. - Designing, implementing, and using domain-specific languages

Industry Examples

Legal Domain: - LegalRuleML: https://www.oasis-open.org/committees/legalruleml/ - Legal domain knowledge representation standard - Akoma Ntoso: http://www.akomantoso.org/ - XML standard for legal documents

Healthcare Domain: - SNOMED CT: https://www.snomed.org/ - Systematized Nomenclature of Medicine—Clinical Terms - LOINC: https://loinc.org/ - Logical Observation Identifiers Names and Codes - ICD-11: https://icd.who.int/ - International Classification of Diseases

Financial Domain: - XBRL: https://www.xbrl.org/ - eXtensible Business Reporting Language - FIX Protocol: https://www.fixtrading.org/ - Financial Information eXchange protocol

Research and Tools

Knowledge Graphs: - Hogan, A., et al. (2021). "Knowledge Graphs." ACM Computing Surveys, 54(4), 1-37. - Comprehensive survey of knowledge graph technologies - https://doi.org/10.1145/3447772 - Neo4j: https://neo4j.com/ - Graph database for connected domain knowledge - Stardog: https://www.stardog.com/ - Enterprise knowledge graph platform

Validation: - SHACL: https://www.w3.org/TR/shacl/ - Shapes Constraint Language for validating RDF graphs - ShEx: http://shex.io/ - Shape Expressions for RDF validation