Chapter 4: Building the Foundation — Data, Configuration, and Structure
"The difference between a domain that works and a domain that sings is entirely in the quality of what you put in before anyone ever clicks Run."
Under the Hood
In Chapter 3 you designed your domain. You have a complete entity map, a validated list of 20 document templates, and a design document that tells you exactly what you are building. Now we open the engine compartment.
This chapter covers the three foundational components that every domain needs before a single template can be generated: the CSV sample data, the domain configuration file, and the folder structure that holds everything together. These are not glamorous. They are not the part of the story people tell at conferences. They are the part that determines whether your domain works flawlessly or frustrates every user who downloads it.
We will go deep. We will look at real file contents, real JSON structures, real CSV patterns. We will explain not just what to write but why each decision matters. And we will use the legal services domain from Chapter 3 alongside SchoolCoop so you can see how the same principles apply across two completely different industries.
By the end of this chapter you will have a complete, validated data foundation ready to receive the 20 templates you will generate in Chapter 5.
Setting Up Your Folder Structure
Before you create a single file, create the folders. The structure is not optional — the Data Publisher server expects exactly this layout, and deviations from it will cause discovery to fail silently.
# Create your domain folder structure
mkdir -p DataPublisher_DomainTemplates/your-domain/csv-data
mkdir -p DataPublisher_DomainTemplates/your-domain/word-templates
mkdir -p DataPublisher_DomainTemplates/your-domain/docs
Replace your-domain with your domain's machine-readable ID. This ID appears in API endpoints, in the domain-config.json, and in the marketplace URL. Choose it carefully:
- Lowercase letters only
- Hyphens to separate words, not underscores
- Short and descriptive:
legal-services,property-management,nonprofit-ops - No spaces, no special characters, no version numbers
Once chosen, never change it. The domain ID is the stable identifier that everything else references. Changing it after publication breaks every existing user's setup.
Your folder now looks like this:
DataPublisher_DomainTemplates/
└── legal-services/
├── csv-data/ ← empty, ready for data
├── word-templates/ ← empty, ready for generated templates
└── docs/ ← empty, ready for reference documentation
Two more files belong directly in the domain root — domain-config.json and README.md. We will create both before this chapter is done.
Building Your CSV Sample Data
This is the most important work in the entire domain creation process. Read that sentence again.
Your CSV files are not test fixtures. They are not placeholder data. They are the living demonstration of your domain — the evidence that your architecture works, the raw material for every generated document, and the first thing a potential buyer will look at when evaluating whether your domain is worth purchasing.
Every hour you invest in realistic, thoughtful sample data returns tenfold in user confidence, template quality, and marketplace credibility.
The Golden Rules of Sample Data
Rule 1: Real names, real places, real scenarios
Not "Client 1," "Attorney A," "Matter 001." Real-sounding names drawn from the actual population of people who work in your domain. Real city names, real street patterns, real zip codes. Realistic matter descriptions, realistic billing amounts, realistic dates.
The test is simple: could someone who works in this industry look at your sample data and mistake it for real data from a real organization? If yes, you have done it right. If no, start over.
Rule 2: Cover the edge cases in your sample data, not just the happy path
The happy path is a single client with one matter, one attorney, a clean invoice with no outstanding balance, and a hearing that went smoothly. Reality is messier. Your sample data must include:
- Clients with multiple matters
- Matters with multiple hearings and multiple deadlines
- Invoices with partial payments and outstanding balances
- Attorneys with varied specialties and billing rates
- Matters that are active, closed, and on hold
- Deadlines that are upcoming, overdue, and completed
These scenarios are not edge cases in your users' real lives. They are Tuesday. Your domain must handle them, and the only way to verify that it does is to have them in your sample data.
Rule 3: Enough records to be meaningful, not so many you cannot maintain them
The target ranges by table type:
| Table Role | Record Count |
|---|---|
| Organizational anchor | 2–3 organizations |
| Primary entities (clients, matters, students) | 25–35 records |
| Secondary entities (attorneys, teachers) | 10–15 records |
| Junction tables (enrollments, billing entries) | 60–120 records |
| Supporting tables (hearings, deadlines) | 30–50 records |
| Financial tables (invoices, payments) | 20–40 records |
SchoolCoop's 350+ total records sit at the upper end of these ranges because it is a complex domain with 15 tables. Your domain may be leaner — 200 to 250 total records across 10 to 12 tables is perfectly sufficient.
Rule 4: Consistent IDs, correct foreign keys
Every record needs a primary key. Every foreign key reference must point to a real record in the referenced table. A student who references FamilyID 7 when only 6 families exist will cause silent failures in template generation that are maddening to debug.
The simplest approach: assign sequential integer IDs (1, 2, 3...) and keep a reference document as you build each table showing which IDs exist. Check every foreign key against this reference before saving the file.
Building the Legal Services CSV Files
Let us build the first three tables of the legal services domain together, in enough detail that you can see exactly what "production-quality sample data" means.
firms.csv
FirmID,FirmName,Address,City,State,ZipCode,Phone,Email,Website,ManagingPartner,Founded,BarNumber,MalpracticeInsurer
1,Harrington & Cole LLP,842 Meridian Avenue Suite 400,Chicago,IL,60611,(312) 555-0147,info@harringtoncole.com,www.harringtoncole.com,Margaret Harrington,1987,ARDC-4471823,CNA Surety
2,Delgado Voss & Partners,1200 Commerce Street 18th Floor,Dallas,TX,75201,(214) 555-0293,contact@delgadovoss.com,www.delgadovoss.com,Rafael Delgado,2003,TBAR-8830294,Travelers
3,Okonkwo Legal Group,55 Post Street Suite 1100,San Francisco,CA,94104,(415) 555-0381,info@okonkwolegal.com,www.okonkwolegal.com,Adaeze Okonkwo,2011,CALBAR-2941057,Hartford
Notice what this data does:
Three distinct firms in three different cities and states. Each has a believable name following real law firm naming conventions (founding partners' surnames). The addresses follow real address patterns for their cities — Suite numbers, floor designations. Phone numbers use the correct area codes. Bar numbers follow realistic patterns for each state's bar association. The founding years span a range that creates different seniority contexts. The malpractice insurers are real companies that insure law firms.
Anyone who works in legal services could look at this and mistake it for real firm data. That is the target.
clients.csv
ClientID,FirmID,ClientType,FirstName,LastName,CompanyName,Address,City,State,ZipCode,Phone,Email,DateOnboarded,Status,ReferralSource
1,1,Individual,Thomas,Brannigan,,2847 Lakeview Drive,Chicago,IL,60614,(312) 555-0412,t.brannigan@gmail.com,2019-03-14,Active,Referral
2,1,Entity,,Meridian Industrial Corp,4400 Fulton Avenue,Chicago,IL,60622,(312) 555-0883,legal@meridianindustrial.com,2018-07-22,Active,Direct
3,1,Individual,Carolyn,Westbrook,,991 Elm Street Apt 4B,Evanston,IL,60201,(847) 555-0267,cwestbrook@outlook.com,2021-11-05,Active,Website
4,1,Entity,,Lakefront Properties LLC,711 N Michigan Avenue Suite 200,Chicago,IL,60611,(312) 555-0594,contracts@lakefrontprops.com,2020-02-18,Active,Referral
5,1,Individual,David,Osei-Mensah,,1502 Maple Lane,Oak Park,IL,60302,(708) 555-0138,dosei@protonmail.com,2022-06-30,Active,Referral
6,2,Entity,,Vega Construction Group,3300 Industrial Blvd,Dallas,TX,75247,(214) 555-0712,legal@vegaconstruction.com,2017-09-11,Active,Direct
7,2,Individual,Patricia,Fontaine,,8820 Preston Road,Dallas,TX,75225,(214) 555-0347,pfontaine@email.com,2020-04-27,Active,Referral
8,2,Entity,,Southwest Medical Supplies Inc,2200 Commerce Street Suite 500,Dallas,TX,75201,(214) 555-0619,admin@swmedsupplies.com,2019-12-03,Active,Direct
9,3,Individual,James,Whitfield-Okafor,,440 Pacific Avenue,San Francisco,CA,94111,(415) 555-0824,jwo@icloud.com,2021-08-15,Active,Website
10,3,Entity,,Presidio Technology Partners,1 Market Street Suite 800,San Francisco,CA,94105,(415) 555-0446,legal@presidiotech.com,2018-03-29,Active,Referral
11,1,Individual,Elena,Kowalczyk,,3318 North Clark Street,Chicago,IL,60657,(773) 555-0193,ekowalczyk@yahoo.com,2020-09-17,Inactive,Referral
12,2,Individual,Marcus,Thibodeau,,5540 Mockingbird Lane,Dallas,TX,75206,(214) 555-0278,mthibodeau@gmail.com,2023-01-08,Active,Website
Both individual and entity clients appear. The ClientType field controls which name fields are populated — a pattern that matters significantly in template design. Clients are distributed across all three firms. An inactive client appears so templates can correctly filter. Referral sources are varied.
matters.csv
MatterID,FirmID,ClientID,LeadAttorneyID,PracticeAreaID,MatterNumber,Description,Status,OpenDate,CloseDate,EstimatedValue,BillingType,HourlyRate,FlatFee,RetainerAmount
1,1,2,3,2,HC-2018-0047,Commercial lease dispute - Meridian Industrial warehouse complex,Active,2018-09-15,,185000.00,Hourly,350.00,,
2,1,1,1,5,HC-2019-0083,Brannigan residential property closing - Lakeview Drive,Closed,2019-04-02,2019-06-28,12500.00,Flat,,4500.00,
3,1,3,2,3,HC-2021-0201,Westbrook employment discrimination claim - Wrongful termination,Active,2021-11-19,,95000.00,Hourly,285.00,,
4,1,4,1,2,HC-2020-0157,Lakefront Properties acquisition due diligence - Pilsen mixed-use,Closed,2020-03-01,2020-08-14,48000.00,Hourly,325.00,,
5,1,5,4,4,HC-2022-0274,Osei-Mensah estate planning - Trust and will preparation,Active,2022-07-14,,8500.00,Flat,,3500.00,
6,1,2,3,2,HC-2023-0318,Meridian Industrial vendor contract negotiations - Three-year supply agreement,Active,2023-02-27,,42000.00,Retainer,,,5000.00
7,2,6,6,2,DV-2017-0024,Vega Construction subcontractor dispute - Riverside project,Closed,2017-10-18,2019-03-07,220000.00,Hourly,395.00,,
8,2,7,5,3,DV-2020-0147,Fontaine divorce proceedings and asset division,Active,2020-05-11,,65000.00,Hourly,320.00,,
9,2,8,7,6,DV-2019-0193,Southwest Medical Supplies regulatory compliance review - FDA,Closed,2019-12-15,2021-06-30,138000.00,Hourly,425.00,,
10,2,12,5,3,DV-2023-0041,Thibodeau personal injury claim - Auto accident,Active,2023-01-22,,45000.00,Contingency,,,
11,3,9,9,5,OL-2021-0088,Whitfield-Okafor estate plan - Complex blended family trust,Active,2021-09-03,,22000.00,Flat,,8000.00,
12,3,10,8,2,OL-2018-0031,Presidio Technology acquisition of CloudBridge Analytics,Closed,2018-04-15,2018-11-22,380000.00,Hourly,450.00,,
13,3,10,10,7,OL-2022-0244,Presidio Technology employment agreement disputes - Four executives,Active,2022-03-18,,95000.00,Hourly,365.00,,
Matter numbers follow realistic law firm conventions. All three billing types appear. Matter descriptions are specific and professional — they will appear verbatim in generated documents. Client 2 has two active matters, testing multi-matter template logic.
The Realistic Data Checklist
Before moving on from any CSV file, run through this checklist:
[ ] Names follow real naming conventions for this domain
[ ] Addresses are geographically consistent (city/state/zip match)
[ ] Phone numbers use correct area codes for their cities
[ ] Dates are logically consistent (close dates after open dates, etc.)
[ ] IDs are sequential and start at 1
[ ] Every foreign key references an ID that actually exists
[ ] Edge cases are represented (inactive records, partial payments, etc.)
[ ] Both "clean" and "complex" scenarios appear
[ ] No placeholder text remains ("TBD", "Enter description here", etc.)
[ ] The data would pass a domain expert's sniff test
Do not move to the next table until every box is checked.
Building domain-config.json
With your CSV files created, it is time to build the configuration file that introduces your domain to the platform. This is the most technically precise document you will write — every field name, every foreign key reference, every template definition must be exactly correct.
We will build it section by section.
Section 1: Domain Metadata
{
"domainId": "legal-services",
"domainName": "Legal Services",
"version": "1.0.0",
"description": "Complete document automation for law firms and legal practices. Generate engagement letters, matter status reports, invoices, court preparation documents, and more from your client and matter data.",
"category": "Professional Services",
"icon": "⚖️",
"targetUser": "Attorneys, paralegals, and legal administrators",
"complexity": "Moderate",
"tableCount": 10,
"templateCount": 20,
"sampleRecordCount": 287
domainId must exactly match your folder name. One character off and the server cannot find your CSV files. version follows semantic versioning — start at 1.0.0, increment the middle number for new templates, the first number for breaking schema changes. description is your marketplace pitch: two sentences naming the target user, the most compelling document types, and the core value proposition.
Section 2: Table Definitions
"tables": [
{
"name": "firms",
"displayName": "Law Firms",
"primaryKey": "FirmID",
"foreignKeys": [],
"recordCount": 3,
"description": "Law firm organizations. The anchor for all other data."
},
{
"name": "clients",
"displayName": "Clients",
"primaryKey": "ClientID",
"foreignKeys": [
{
"field": "FirmID",
"references": "firms.FirmID"
}
],
"recordCount": 12,
"description": "Client records for both individuals and business entities."
},
{
"name": "matters",
"displayName": "Matters",
"primaryKey": "MatterID",
"foreignKeys": [
{ "field": "FirmID", "references": "firms.FirmID" },
{ "field": "ClientID", "references": "clients.ClientID" },
{ "field": "LeadAttorneyID", "references": "attorneys.AttorneyID" },
{ "field": "PracticeAreaID", "references": "practice_areas.PracticeAreaID" }
],
"recordCount": 13,
"description": "Legal matters (cases). Central entity connecting clients, attorneys, and all matter activity."
},
{
"name": "billing_entries",
"displayName": "Billing Entries",
"primaryKey": "EntryID",
"foreignKeys": [
{ "field": "MatterID", "references": "matters.MatterID" },
{ "field": "AttorneyID", "references": "attorneys.AttorneyID" }
],
"recordCount": 124,
"description": "Time and expense entries for billing. The line items that appear on invoices."
}
],
Every foreign key in your CSV files must appear in this section. The matters table has four foreign keys — it is the most connected table in the domain and can pull fields from firms, clients, attorneys, and practice areas in a single template. That is what makes complex documents like the Engagement Letter possible with a single primary table selection.
Section 3: Document Type Definitions
"documentTypes": [
{
"id": 1,
"name": "Engagement Letter",
"description": "Formal letter confirming attorney-client relationship, scope of representation, and fee arrangement.",
"category": "Client Communications",
"complexity": "Moderate",
"primaryTable": "matters",
"relatedTables": ["clients", "attorneys", "firms"],
"filename": "Template-01-EngagementLetter.docx",
"tags": ["client-facing", "compliance", "new-matter"]
},
{
"id": 3,
"name": "Invoice",
"description": "Detailed billing statement showing time entries, expenses, and amounts due. Supports hourly, flat-fee, and retainer billing types.",
"category": "Financial",
"complexity": "Complex",
"primaryTable": "invoices",
"relatedTables": ["clients", "matters", "billing_entries", "attorneys"],
"filename": "Template-03-Invoice.docx",
"tags": ["client-facing", "financial", "billing"]
},
{
"id": 4,
"name": "Matter Status Report",
"description": "Internal report showing active matter details, upcoming deadlines, hearings, and billing status.",
"category": "Operational",
"complexity": "Complex",
"primaryTable": "matters",
"relatedTables": ["clients", "attorneys", "hearings", "deadlines", "invoices"],
"filename": "Template-04-MatterStatusReport.docx",
"tags": ["internal", "management"]
}
]
Complexity levels matter to users: Simple means one or two tables with no loops, Moderate means related tables and simple iteration, Complex means multiple joins or financial calculations. Tags make the marketplace browseable — choose them from the perspective of how users will search, not how you categorize.
Section 4: Closing the Configuration
"relationships": {
"primaryChains": [
"firms → clients → matters",
"firms → attorneys → matters",
"matters → hearings",
"matters → deadlines",
"matters → billing_entries → invoices"
]
},
"metadata": {
"author": "Your Name",
"authorEmail": "your@email.com",
"created": "2026-01-15",
"lastUpdated": "2026-01-15",
"license": "Commercial",
"supportUrl": "https://yourwebsite.com/support",
"changeLog": [
{
"version": "1.0.0",
"date": "2026-01-15",
"changes": ["Initial release"]
}
]
}
}
The relationships.primaryChains section is documentation for users showing how data flows through your domain in plain language. The metadata section is your authorship and version history — fill it out completely, it is what the marketplace shows on your domain's detail page.
Validating Your Configuration
Before writing a single template, validate with three checks:
Check 1: JSON syntax
node -e "JSON.parse(require('fs').readFileSync('domain-config.json', 'utf8')); console.log('Valid JSON')"
A single misplaced comma causes silent domain discovery failure. Run this before anything else.
Check 2: Foreign key cross-reference
For every foreign key declaration, verify the referenced table name exactly matches a table in your tables array, the referenced column matches that table's primary key, and the CSV file actually contains that column.
Check 3: Template-to-table validation
For every template, verify the primaryTable and every entry in relatedTables exists in your tables array, and that the chain of foreign keys connecting them is complete.
Testing Domain Discovery
# Start the backend server
cd server && npm run dev
# Test discovery — your domain should appear
curl http://localhost:3001/api/domains
# Test config loads cleanly
curl http://localhost:3001/api/domains/legal-services/config
Template count will be zero — you have not generated any yet. What you need: the domain appears in the domains list and the config endpoint returns your full configuration without errors. If the domain does not appear, check in order: folder name matches domainId, JSON is valid, server has read permissions on DataPublisher_DomainTemplates.
Writing Your README.md
Write it now, while the architecture is fresh. The README serves both as marketplace listing and user guide. Its two most critical sections:
Quick Start — gets a new user from download to first generated document in under five minutes. Write it step by step. Test it yourself. If it takes more than five minutes, fix it.
Use Cases — tells users what the domain is for, not just what it can do. The difference: "Generate a hearing preparation sheet from matter and hearing data" is a feature description. "At 7am before a 9am motion hearing, generate a complete preparation brief showing case background, your client's position, all relevant deadlines, and your preparation checklist — in 30 seconds" is a use case. Write four to six of these. Write them for the person who is busy and stressed and needs something to work right now.
The Foundation Is Complete
At the end of this chapter, your domain folder looks like this:
DataPublisher_DomainTemplates/legal-services/
│
├── domain-config.json ✓ Complete, validated, server-discoverable
├── README.md ✓ Written for users, not developers
│
├── csv-data/
│ ├── firms.csv ✓ 3 realistic organizations
│ ├── clients.csv ✓ 12 records, individual and entity types
│ ├── attorneys.csv ✓ 10 records across 3 firms
│ ├── practice_areas.csv ✓ 8 practice area categories
│ ├── matters.csv ✓ 13 matters, all billing types represented
│ ├── hearings.csv ✓ 28 records with varied outcomes
│ ├── deadlines.csv ✓ 45 records, upcoming and overdue
│ ├── billing_entries.csv ✓ 124 time entries
│ ├── invoices.csv ✓ 32 invoices with varied payment status
│ └── documents_filed.csv ✓ 47 court filing records
│
├── word-templates/ ← empty, ready for Chapter 5
│
└── docs/ ← empty, ready after templates are built
The server discovers your domain. The config loads cleanly. The CSV files pass the realistic data checklist. The README serves both the marketplace and the new user.
This is what a properly tuned engine looks like before it runs. Every component is in the right place, correctly connected, ready to perform. In Chapter 5, you write the generator script that brings it to life.
Chapter Summary
The three foundation components of a domain — CSV sample data, domain-config.json, and README.md — must be built with the same care as the templates themselves. They are not setup steps to rush through. They are the infrastructure that everything else depends on.
Realistic sample data covering edge cases is the single highest-leverage investment in domain quality. It determines whether generated documents look real or look like prototypes.
The domain-config.json connects your data structure to the platform through precise foreign key declarations. Validation before template generation saves hours of debugging later.
The README serves both as marketplace listing and user guide. The Quick Start section and the Use Cases section are the two most important parts — write them for the person who downloaded your domain at 7am and needs something done by 9am.
What's Next
Chapter 5 is where the engine roars to life. We write the JavaScript generator script that creates all 20 Word templates programmatically — reducing 40 to 80 hours of manual template creation to 30 minutes of focused code. Every placeholder, every loop, every conditional, every formatting decision — built once, generating perfectly every time.
Volume 5 — Building Intelligent Systems: Building Organizational Knowledge Systems on Data Publisher for Word Part of the Building Intelligent Systems Series