Chapter 4.7: OAuth Authentication for External Data Sources
"The moment someone connects their real data to your domain — their actual Excel workbook, their live Google Sheet, their practice management database — your domain stops being a demo and becomes part of their daily workflow."
The Integration Layer
In Chapter 4 you built the foundation: CSV files, domain configuration, and folder structure. Your domain works with sample data. Users can download it, load it, and generate documents from the pre-populated CSVs.
But sample data has a ceiling. A law firm coordinator who enters their first 50 client records into a CSV file feels productive. After 100 records, they feel tedious. After 500 records, they abandon the system. CSV files are not a workflow — they are a prototype.
Real domains scale through connected data sources: Google Sheets updated by the team collaboratively, Excel workbooks in SharePoint that reflect current business state, SQL databases that power existing practice management systems. When your domain can sync data from these sources with a single click, CSV maintenance disappears entirely.
This chapter covers the OAuth2 authentication system that makes external data connections secure, automatic, and transparent. By the end, you will understand exactly how users connect Microsoft Excel, Google Sheets, and other OAuth-protected data sources to your domain — how the sync-and-refresh model works, and how to document that process so your users can do it confidently.
OAuth2 in DataPublisher: The Big Picture
OAuth2 (Open Authorization 2.0) is the industry-standard protocol for secure API access. When a user "connects Google Sheets" or "connects Excel Online," what they are actually doing is completing an OAuth2 authorization flow:
- User initiates connection — Clicks "Connect Google Sheets" in DataPublisher
- DataPublisher requests authorization — Opens Google's authorization page
- User grants permission — Logs into Google, approves requested access
- Google issues tokens — Sends access token and refresh token back to DataPublisher
- Tokens stored encrypted — DataPublisher stores tokens securely in database
- Automatic refresh — When access token expires, refresh token gets a new one
The user never gives DataPublisher their password. Google (or Microsoft) validates the user's identity, and issues temporary credentials (tokens) that DataPublisher uses to read data on the user's behalf.
From your domain's perspective, this is beautifully simple: your domain doesn't know OAuth is happening. The platform handles authentication. Your domain just declares which tables it needs, and DataPublisher fills them — whether from CSV files, Google Sheets, Excel workbooks, or SQL databases.
But your users need to understand this flow to connect their data sources confidently. And you need to document it so they succeed on their first attempt.
Microsoft Excel / OneDrive Integration
Microsoft Excel Online (OneDrive Personal, OneDrive for Business, SharePoint) is one of the most common live data sources for DataPublisher domains. Law firms store client lists in Excel. Schools track enrollments in Excel. Real estate offices maintain property listings in Excel.
The OAuth2 flow for Excel looks like this from the user's perspective:
User's Experience (Step-by-Step)
Step 1: Navigate to Data Sources User opens DataPublisher in Word, goes to the Data tab, clicks Data Sources.
Step 2: Choose Microsoft Excel Clicks Connect Microsoft Excel.
DataPublisher makes an API call:
GET https://app.datapublisher.io/api/microsoft/auth/start
Authorization: Bearer <user_jwt_token>
The API responds with an authorization URL:
{
"success": true,
"authUrl": "https://login.microsoftonline.com/common/oauth2/v2.0/authorize?client_id=...&scope=Files.Read.All%20offline_access%20User.Read&redirect_uri=..."
}
Step 3: Authorization Dialog Opens
DataPublisher opens authUrl in an Office dialog box. User sees Microsoft's login page.
Step 4: User Logs In User enters their Microsoft account credentials (or clicks "Stay signed in" if already logged in).
Step 5: Permission Grant Microsoft asks: "DataPublisher for Word would like to: - Read all files you can access - Maintain access to data you've given it access to - Read your profile
Do you consent?"
User clicks Accept.
Step 6: Token Exchange (Background) Microsoft redirects to:
https://app.datapublisher.io/api/microsoft/auth/callback?code=ABC123...&state=xyz789
DataPublisher's server:
1. Validates state parameter (CSRF protection)
2. Exchanges authorization code for tokens
3. Receives access_token (valid 1 hour) and refresh_token (valid indefinitely)
4. Calls Microsoft Graph API to get user's email address
5. Encrypts tokens using AES-256
6. Stores in microsoft_auth database table:
sql
user_id: <user's UUID>
email: user@company.com
access_token: <encrypted>
refresh_token: <encrypted>
expires_at: 2026-02-27 15:30:00
Step 7: Success Dialog shows "✓ Connected successfully" and auto-closes. User sees "Microsoft Excel – Connected" in Data Sources panel.
What Scopes Are Requested
Microsoft OAuth2 requires explicit scopes — permissions the app requests:
| Scope | Purpose | Required For |
|---|---|---|
Files.Read.All |
Read all files user can access | Reading Excel workbooks from OneDrive/SharePoint |
offline_access |
Get refresh token | Auto-refresh when access token expires |
User.Read |
Get user's email and profile | Display connected account, store email for reference |
Why Files.Read.All and not Files.Read?
SharePoint files require Files.Read.All. Personal OneDrive works with Files.Read, but business users need the broader scope. Better to request it once than force business users through a second authorization.
Selecting an Excel Workbook
After connection, user selects a workbook to use as a data source:
Step 1: List Workbooks User clicks Select Workbook. DataPublisher calls:
GET https://app.datapublisher.io/api/microsoft/workbooks
Authorization: Bearer <user_jwt_token>
API returns:
{
"success": true,
"workbooks": [
{
"id": "01AZJL5PN6Y2GOVW7725BZO354PWSELRRZ",
"name": "Client Database.xlsx",
"webUrl": "https://companyname-my.sharepoint.com/...",
"lastModifiedDateTime": "2026-02-26T10:15:00Z"
},
{
"id": "01AZJL5PNUJRXZM33ZVBCJYP3BQYGZK5HQ",
"name": "Case Matters.xlsx",
"webUrl": "https://onedrive.live.com/...",
"lastModifiedDateTime": "2026-02-27T08:30:00Z"
}
]
}
Step 2: Choose Workbook User selects "Client Database.xlsx". DataPublisher stores the workbook ID.
Step 3: Map Worksheets to Tables DataPublisher reads the workbook structure:
GET /api/microsoft/workbooks/{workbookId}/worksheets
Returns:
{
"worksheets": [
{ "id": "...", "name": "Clients", "position": 0 },
{ "id": "...", "name": "Matters", "position": 1 },
{ "id": "...", "name": "Attorneys", "position": 2 }
]
}
User maps each worksheet to a domain table:
- Worksheet "Clients" → Domain table clients
- Worksheet "Matters" → Domain table matters
- Worksheet "Attorneys" → Domain table attorneys
DataPublisher stores this mapping and imports a snapshot of the data into its local database. From now on, when a template needs clients table data, DataPublisher reads from this imported snapshot. When the user updates their Excel file and wants fresh data, they click the Refresh button (🔄) on the data file card to re-import the latest version.
Token Refresh (Automatic)
Access tokens expire after 1 hour. Refresh tokens last indefinitely (until user revokes permission).
When DataPublisher needs to read Excel data and finds an expired access token:
- Checks
expires_attimestamp in database:2026-02-27 09:30:00(expired) - Calls Microsoft token refresh endpoint: ``` POST https://login.microsoftonline.com/common/oauth2/v2.0/token Content-Type: application/x-www-form-urlencoded
grant_type=refresh_token
&refresh_token=
User never sees this happen. From their perspective, Excel data "just works."
Google Sheets Integration
Google Sheets OAuth2 flow is architecturally identical to Microsoft Excel, with Google endpoints and scopes instead of Microsoft's.
OAuth2 Flow
Step 1: Initiate Connection
GET https://app.datapublisher.io/api/google/auth/start
Returns:
{
"success": true,
"authUrl": "https://accounts.google.com/o/oauth2/v2/auth?client_id=...&scope=https://www.googleapis.com/auth/spreadsheets.readonly%20...&redirect_uri=..."
}
Step 2: User Authorizes Dialog opens to Google's OAuth consent screen. User logs in, grants permissions.
Step 3: Token Exchange Google redirects to:
https://app.datapublisher.io/api/google/auth/callback?code=...&state=...
DataPublisher exchanges code for tokens, stores encrypted in google_auth table.
Google Scopes
| Scope | Purpose |
|---|---|
https://www.googleapis.com/auth/spreadsheets.readonly |
Read Google Sheets data |
https://www.googleapis.com/auth/userinfo.email |
Get user's Google email |
openid |
OpenID Connect for authentication |
Read-only access: DataPublisher never writes to Google Sheets. Users update their sheet in Google Sheets directly; DataPublisher imports a snapshot when the connection is created and refreshes that snapshot when the user clicks the Refresh button.
Selecting a Google Sheet
Step 1: List Sheets
GET /api/google/spreadsheets
Returns all spreadsheets from user's Google Drive:
{
"spreadsheets": [
{
"id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
"name": "School Co-op Student Database",
"webViewLink": "https://docs.google.com/spreadsheets/d/...",
"modifiedTime": "2026-02-27T10:00:00Z"
}
]
}
Step 2: Map Sheets to Tables User selects spreadsheet. DataPublisher reads sheet names:
GET /api/google/spreadsheets/{spreadsheetId}/sheets
Returns:
{
"sheets": [
{ "name": "Students", "index": 0 },
{ "name": "Classes", "index": 1 },
{ "name": "Families", "index": 2 }
]
}
User maps:
- Sheet "Students" → Domain table students
- Sheet "Classes" → Domain table classes
- Sheet "Families" → Domain table families
DataPublisher imports the data from each sheet into its local database. When templates need student data, DataPublisher reads from this imported snapshot (fast, reliable, works offline). Users click the Refresh button to pull the latest data from Google Sheets whenever they've made updates.
Data Format Requirements
Google Sheets must follow the same CSV structure as your domain:
Row 1 must be headers matching your domain's column names exactly:
StudentID | FirstName | LastName | BirthDate | ParentEmail
Row 2+ are data rows:
1 | Emma | Johnson | 2015-03-12 | emma.j@example.com
2 | Liam | Chen | 2016-07-08 | liam.chen@example.com
Empty cells are treated as NULL (same as CSV behavior).
Column order doesn't matter — DataPublisher maps by header name, not position.
Email Sending via Microsoft Graph
Domains can send emails through users' Microsoft 365 / Outlook accounts. Common use cases: - Law firms sending engagement letters via email - Schools sending enrollment confirmations - Real estate agents sending listing updates
Email sending uses a separate OAuth2 flow from Excel (different scopes).
Email OAuth2 Flow
Step 1: Navigate to Email Settings User goes to Actions tab → Email Campaigns → Connect Email Account.
Step 2: Initiate Microsoft Connection
GET /api/email/auth/microsoft/connect
Returns authorization URL with email-specific scopes:
{
"authUrl": "https://login.microsoftonline.com/common/oauth2/v2.0/authorize?scope=Mail.Send%20offline_access%20User.Read&..."
}
Step 3: User Authorizes Email Sending Dialog opens. Microsoft asks: "DataPublisher for Word would like to: - Send mail as you - Maintain access to data you've given it access to - Read your profile
Do you consent?"
User clicks Accept.
Step 4: Tokens Stored
DataPublisher stores encrypted tokens in email_providers table:
user_id: <UUID>
provider: 'microsoft'
email_address: user@company.com
access_token: <encrypted>
refresh_token: <encrypted>
expires_at: <timestamp>
Email Scopes
| Scope | Purpose |
|---|---|
Mail.Send |
Send emails as the authenticated user |
offline_access |
Refresh token for long-term access |
User.Read |
Get user's email address for "From" field |
Security note: DataPublisher can only send emails as the authenticated user. If user@company.com authorizes, DataPublisher can only send from that address — never from any other account.
Sending an Email
After connection, domains can send emails via Actions or Email Campaigns:
API Call:
POST /api/email/send
Authorization: Bearer <user_jwt_token>
{
"to": "client@example.com",
"subject": "Engagement Letter for Matter #2024-0047",
"body": "<html>...</html>",
"attachments": [
{
"filename": "Engagement_Letter.pdf",
"content": "<base64_encoded_pdf>"
}
]
}
Behind the scenes:
1. DataPublisher retrieves user's stored Microsoft tokens
2. Checks if access token expired → refreshes if needed
3. Calls Microsoft Graph API:
```
POST https://graph.microsoft.com/v1.0/me/sendMail
Authorization: Bearer
{ "message": { "subject": "Engagement Letter for Matter #2024-0047", "body": { "contentType": "HTML", "content": "..." }, "toRecipients": [{ "emailAddress": { "address": "client@example.com" }}], "attachments": [...] } } ``` 4. Microsoft sends email from user's account 5. Email appears in user's "Sent Items" folder (since it's sent as them)
SQL Server / Direct Database Connections
For enterprises using practice management systems (Clio, MyCase, Salesforce) or custom databases, SQL Server connections bypass OAuth entirely — they use traditional database authentication.
Connection Flow
Step 1: Navigate to Data Sources User goes to Data tab → Data Sources → Connect SQL Server.
Step 2: Enter Connection Details
User provides:
- Server Address: sql-server.company.com\INSTANCE
- Database Name: PracticeManagement
- Authentication Type: Windows Authentication or SQL Server Authentication
- Username/Password: (if SQL auth)
Step 3: Test Connection DataPublisher calls:
POST /api/sql/test-connection
{
"server": "sql-server.company.com\\INSTANCE",
"database": "PracticeManagement",
"authType": "sql",
"username": "datapublisher_reader",
"password": "<encrypted>"
}
If successful, stores encrypted connection string.
Step 4: Map Tables DataPublisher lists available tables in the database:
GET /api/sql/tables
Returns:
{
"tables": [
{ "name": "clients", "rowCount": 1247 },
{ "name": "matters", "rowCount": 3891 },
{ "name": "billing", "rowCount": 15632 }
]
}
User maps SQL tables to domain tables (often 1:1 if naming matches).
Security Considerations
Read-only access recommended: Create a SQL user with SELECT permissions only:
CREATE LOGIN datapublisher_reader WITH PASSWORD = 'SecurePassword123!';
CREATE USER datapublisher_reader FOR LOGIN datapublisher_reader;
GRANT SELECT ON DATABASE::PracticeManagement TO datapublisher_reader;
Network access required: SQL Server must be accessible from DataPublisher's servers (cloud-hosted). Most enterprises require VPN or IP whitelisting.
Connection string encryption: All SQL credentials stored AES-256 encrypted in database.
The Data Sync Model: Import, Store, Refresh
Now that you understand OAuth2 authorization, here's the crucial architectural detail: DataPublisher does not query external sources in real-time when generating documents.
Instead, it uses a sync-and-refresh model:
How Data Flows
Step 1: Initial Import (Automatic)
When a user connects an external data source and maps worksheets/sheets to domain tables, DataPublisher:
1. Uses OAuth2 credentials to fetch data via API (Microsoft Graph, Google Sheets API, etc.)
2. Imports up to 10,000 rows per table
3. Stores data in its local database (PostgreSQL)
4. Records lastSyncedAt timestamp
Step 2: Document Generation (Fast) When user clicks Run to generate a document: 1. Template reads from local database (not from Excel/Sheets) 2. No API calls to external sources 3. Sub-100ms query performance (local database) 4. Works even if external source is temporarily offline
Step 3: Refresh (User-Triggered)
When user updates their Excel file or Google Sheet and wants that data reflected:
1. User clicks Refresh button (🔄 icon) on the data file card
2. DataPublisher fetches fresh data from the external source
3. Updates local database with new snapshot
4. Updates lastSyncedAt timestamp
5. Next document generation uses the refreshed data
The Refresh Button
Every connected data file shows a refresh button in its card. The UI provides clear feedback about data freshness:
Data Freshness Indicator: - 🟢 "Synced 5 minutes ago" — Green badge, data is fresh - 🟡 "Synced 6 hours ago" — Yellow badge, consider refreshing - 🔴 "Synced 3 days ago" — Red badge, likely stale
Refresh Button States: - ✅ Enabled: File connected to OAuth source, has column headers, not currently processing - ❌ Disabled (gray): File not from connected source, needs headers, or currently processing - ⏳ Processing: Shows spinner during refresh operation
User Experience:
1. User updates "Client Database.xlsx" in OneDrive (adds 5 new clients)
2. Opens DataPublisher in Word
3. Sees "Client Database" card shows "Synced 2 hours ago" (yellow)
4. Clicks refresh button (🔄)
5. DataPublisher shows "Refreshing data..." message
6. 2-3 seconds later: "✓ Data refreshed successfully! 205 rows updated"
7. Card now shows "Synced just now" (green)
8. User clicks Run → document includes the 5 new clients
Why This Model is Brilliant
Performance: Local database queries are 50-100x faster than API calls to Google/Microsoft. Document generation takes seconds, not minutes.
Reliability: External API rate limits don't affect document generation. If Google Sheets is down for maintenance, users can still generate documents from their last sync.
Offline capability: Users on slow networks or temporary disconnection can still work with their most recent data snapshot.
Cost efficiency: No per-generation API costs. One API call per refresh (user-controlled), not per document.
Predictable behavior: Users know exactly when they're getting fresh data (after they click refresh). No confusion about "why doesn't my document show my latest changes?" — if they didn't refresh, they get the snapshot.
Scalability: 10,000-row limit prevents runaway API costs while handling 95% of small business use cases.
Future: Direct Connection Mode
Roadmap Note: A future version will offer Direct Connection Mode as an opt-in feature for enterprises needing absolute real-time accuracy (e.g., stock trading, emergency medical records). This mode will query external sources on each document generation, with the performance and reliability trade-offs that entails. For 99% of users, the sync-and-refresh model delivers better speed, reliability, and user experience.
Security & Token Management
OAuth2 and database credentials are security-critical. DataPublisher uses multiple layers of protection.
Token Storage
Encryption at rest: - All access tokens encrypted with AES-256 before database storage - Encryption key stored in server environment variables (not in code) - Separate encryption key per environment (dev/staging/production)
Database schema:
CREATE TABLE microsoft_auth (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
email VARCHAR(255),
access_token TEXT, -- AES-256 encrypted
refresh_token TEXT, -- AES-256 encrypted
expires_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Per-user isolation: Each user's tokens stored separately. User A cannot access User B's connected data sources.
Token Revocation
Users can revoke access at any time:
Within DataPublisher: 1. Go to Data Sources 2. Find connected source (e.g., "Microsoft Excel – Connected") 3. Click Disconnect 4. DataPublisher deletes stored tokens from database
Within Microsoft/Google: Users can revoke at the source: - Microsoft: https://account.microsoft.com/privacy/app-access - Google: https://myaccount.google.com/permissions
When revoked externally, DataPublisher's tokens become invalid. Next data access attempt fails, user sees "Reconnect required."
Scope Limitation
DataPublisher requests minimum necessary scopes:
Excel/OneDrive:
- ✅ Files.Read.All (read files)
- ❌ NOT Files.ReadWrite.All (would allow writing/deleting)
Email:
- ✅ Mail.Send (send as user)
- ❌ NOT Mail.ReadWrite (would allow reading inbox)
Google Sheets:
- ✅ spreadsheets.readonly (read sheets)
- ❌ NOT spreadsheets (would allow editing)
This follows principle of least privilege — request only what's needed for declared functionality.
HTTPS Everywhere
All OAuth callbacks use HTTPS:
✅ https://app.datapublisher.io/api/microsoft/auth/callback
❌ http://app.datapublisher.io/... (rejected by OAuth providers)
Microsoft and Google require HTTPS redirect URIs for production apps. HTTP only works for localhost development.
Troubleshooting OAuth Issues
Issue 1: "Authorization failed" After User Clicks Accept
Symptom: User completes Microsoft/Google login, clicks Accept, sees "Authorization failed" error.
Common Causes: 1. State mismatch (CSRF): Browser cookies disabled or third-party cookies blocked 2. Expired authorization code: User took >10 minutes to complete login 3. Redirect URI mismatch: DataPublisher's registered URI doesn't match callback
Solution: - Enable cookies in browser (required for OAuth state tracking) - Complete authorization flow within 5 minutes - Try different browser if issue persists (Edge/Chrome recommended)
Issue 2: "Access token expired" When Accessing Data
Symptom: User connected Excel weeks ago, now sees "Please reconnect Excel" error.
Common Causes: 1. Refresh token revoked: User changed Microsoft password 2. Refresh token expired: Microsoft refresh tokens last 90 days of inactivity 3. Permission scope removed: Admin removed app consent at tenant level
Solution: - Click Reconnect and re-authorize - If repeated failures, check Microsoft admin settings (enterprise accounts) - Personal accounts: Verify app hasn't been removed from account.microsoft.com
Issue 3: "File not found" When Reading Excel Workbook
Symptom: User connected Excel, mapped workbook, now DataPublisher can't find it.
Common Causes: 1. File moved: Workbook moved to different folder in OneDrive 2. File renamed: Workbook name changed 3. File deleted: Workbook moved to recycle bin 4. Permissions changed: User no longer has access (shared files)
Solution: - Go to Data Sources → Select Workbook → Re-select the file - DataPublisher stores workbook ID, not path — renaming shouldn't break it - If file deleted, restore from OneDrive recycle bin or select different file
Issue 4: Excel Worksheet Columns Don't Match Domain Tables
Symptom: User connects Excel, runs template, sees error: "Column 'ClientID' not found."
Cause: Excel worksheet headers don't exactly match domain's column names.
Example:
Domain expects: ClientID, FirstName, LastName
Excel has: Client ID, First Name, Last Name
Solution:
Headers must match exactly (case-sensitive, no spaces unless domain uses spaces):
- Rename Excel headers: Client ID → ClientID
- Or update domain's column names to match Excel
Best practice: Document exact column names in domain's README so users know what to rename.
Issue 5: Google Sheets Cell Formats Cause Data Errors
Symptom: Numbers from Google Sheets appear as text in generated documents, or dates show as serial numbers.
Cause: Google Sheets cell formatting doesn't match expected data type.
Solution: - Dates: Format cells as "Date" in Google Sheets (not "Plain text" or "Number") - Numbers: Format as "Number" (not "Plain text") - Currencies: Format as "Currency" or "Number" (DataPublisher reads numeric value)
DataPublisher respects Google Sheets cell formatting when reading via API.
Documenting OAuth in Your Domain
Your users need clear instructions to connect external data sources. Here's what to include in your domain's docs/DATA_SOURCES_INTEGRATION.md:
Essential Sections
1. Overview Explain what connected data sources are and why they're better than manual CSV updates. Clarify the sync-and-refresh model: data is imported and stored locally, users click Refresh to sync latest changes.
2. Prerequisites - Microsoft 365 / OneDrive account (for Excel) - Google account (for Google Sheets) - Excel/Sheets file already created with correct structure
3. Step-by-Step Connection Guide Walk through OAuth flow with screenshots: - Where to click "Connect Excel" - What Microsoft's permission dialog looks like - How to select the right workbook - How to map sheets to tables
4. Column Name Reference List exact column names your domain expects:
Required columns for 'clients' table:
- ClientID (integer, unique)
- FirstName (text, required)
- LastName (text, required)
- Email (text, optional)
- Phone (text, optional)
5. Sample Template
Provide a downloadable .xlsx or Google Sheets template with headers pre-filled and a few example rows.
6. Troubleshooting Include the 5 issues from this chapter plus any domain-specific edge cases.
Example Documentation Snippet
## Connecting Microsoft Excel
Follow these steps to connect your Excel workbook as a live data source:
### Step 1: Prepare Your Excel File
Create an Excel file with one worksheet per domain table. For the Legal Services domain, you'll need:
- **Clients** worksheet with columns: ClientID, FirstName, LastName, Email, Phone, CompanyName, ClientType
- **Matters** worksheet with columns: MatterID, ClientID, MatterNumber, Description, OpenDate, Status, BillingType
- **Attorneys** worksheet with columns: AttorneyID, FirstName, LastName, Email, BarNumber, Specialty
**Important:** Column names must match exactly (case-sensitive).
[Download Excel Template →](legal-services-excel-template.xlsx)
### Step 2: Save to OneDrive
Save your Excel file to OneDrive (Personal or Business) or SharePoint. DataPublisher needs the file to be in Microsoft's cloud to access it.
### Step 3: Connect in DataPublisher
1. Open Microsoft Word with DataPublisher add-in
2. Go to **Data** tab → **Data Sources**
3. Click **Connect Microsoft Excel**
4. Sign in to your Microsoft account when prompted
5. Click **Accept** when Microsoft asks for permissions
You'll see "✓ Microsoft Excel – Connected"
### Step 4: Select Your Workbook
1. Click **Select Workbook**
2. Choose your Excel file from the list
3. Map each worksheet to a domain table:
- Worksheet "Clients" → Table `clients`
- Worksheet "Matters" → Table `matters`
- Worksheet "Attorneys" → Table `attorneys`
4. Click **Save Mapping**
That's it! DataPublisher imports your Excel data into its local database.
**To refresh data after making changes:**
1. Update your Excel file in OneDrive
2. Go to the **Data** tab in DataPublisher
3. Find your "Client Database" card
4. Click the **Refresh** button (🔄 icon)
5. Wait 2-3 seconds for "✓ Data refreshed successfully!"
6. Generate documents with your latest data
### Troubleshooting
**"File not found" error:** Make sure the Excel file is saved in OneDrive, not your local computer.
**"Column not found" error:** Check that worksheet headers exactly match the column names listed in Step 1 (case-sensitive).
**"Reconnect required" message:** Your Excel connection expired. Click **Reconnect** and sign in again.
The OAuth Payoff
OAuth authentication adds complexity for you (the domain builder) in documentation. It adds a 60-second setup step for your users. But here's what it unlocks:
For users: - Data lives in their existing tools (Excel, Google Sheets) - Team members update data where they already work - One-click refresh to sync latest changes (no CSV exports/imports) - Fast, reliable document generation from local snapshots - Version control through Excel/Sheets (revision history)
For your domain: - Scales to thousands of records effortlessly - Works with enterprise data systems (Clio, Salesforce, QuickBooks via SQL) - Email sending through users' actual accounts (professional, trusted sender) - Users stay subscribed longer (sticky integration)
For DataPublisher platform: - Competitive feature vs. Zapier, Integromat, custom scripting - Enterprise-ready (Fortune 500 security standards) - Marketplace differentiation (most no-code tools don't do OAuth right)
The 15 minutes you spend writing a great OAuth integration guide pays dividends for years.
Chapter Summary
OAuth2 is the industry-standard protocol for secure API access without sharing passwords. DataPublisher implements OAuth2 flows for Microsoft Excel/OneDrive, Google Sheets, and Microsoft 365 email sending.
Microsoft Excel OAuth requests Files.Read.All, offline_access, and User.Read scopes. Users authorize once; tokens refresh automatically. Excel workbooks map to domain tables via worksheet names.
Google Sheets OAuth requests spreadsheets.readonly and userinfo.email scopes. Flow mirrors Excel — authorize once, import data snapshot, click Refresh to sync updates.
The Sync Model: DataPublisher imports data from connected sources and stores it locally for fast, reliable document generation. Users click the Refresh button (🔄) to sync the latest changes from their Excel files or Google Sheets. This model delivers better performance (50-100x faster than API calls), reliability (works when external sources are offline), and user control (explicit refresh = predictable behavior).
Email OAuth requests Mail.Send scope. Users send emails through their own Microsoft 365 accounts with DataPublisher handling the API calls securely.
Security: All tokens encrypted AES-256 at rest. Per-user isolation. Minimum necessary scopes. Automatic token refresh. HTTPS everywhere.
Documentation: Your domain's DATA_SOURCES_INTEGRATION.md should walk users through OAuth step-by-step with screenshots, column name requirements, and troubleshooting for the 5 most common OAuth issues.
OAuth moves your domain from "interesting demo" to "daily workflow tool." Users who connect real data stay subscribed.
What's Next
Chapter 5 covers the generator script — the JavaScript code that creates all 20 Word templates programmatically. Now that you understand how users connect their data (CSV files in Chapter 4, live sources in Chapter 4.7), Chapter 5 shows you how to build the templates that transform that data into professional documents.
Volume 5 — Building Intelligent Systems: Building Organizational Knowledge Systems on Data Publisher for Word Part of the Building Intelligent Systems Series