Data Publisher API - Complete Guide

Introduction

Welcome to the Data Publisher API documentation. This guide is written for developers who need to build serious integrations with the Data Publisher platform — whether you're creating a custom client interface, building workflow automation, or developing a vertical-specific application.

The Data Publisher v2.0 API exposes 123 endpoints across 23 namespaces. More importantly, it exposes the complete operational logic of the platform in three discrete layers: data comes in, it moves through a document engine, and the output goes out through a distribution layer. Understanding that layering is the first step toward building integrations that actually work in production environments.

Base URL: https://app.datapublisher.io/api

Development URL: http://localhost:3001/api

Authentication: JWT Bearer tokens

Data Format: JSON (with multipart/form-data for file uploads)

The Three-Layer Architecture

Before writing a single line of integration code, spend time understanding how the platform is organized. The namespaces are not arbitrary — they reflect a deliberate separation of concerns that maps directly onto how document automation workflows actually operate in practice.

Layer 1: Data Layer

Every document automation workflow begins with data that lives somewhere: a spreadsheet maintained in Google Sheets, a customer database on a SQL Server instance, an Excel file on SharePoint, or a CSV exported from a legacy system. The data layer's job is to accept all of those sources and normalize them into a consistent internal format that the document engine can consume without needing to know where the data came from.

Key namespaces: /csv, /google, /microsoft, /sql-server, /data-sources, /sync, /data-sets

Layer 2: Document Engine

This is the core of what Data Publisher does: it takes a Word template with {{variable}} placeholders and a data file, and produces populated documents. The engine handles:

DOCX generation

PDF conversion

OOXML extraction for Office.js insertion

Image embedding

Multi-table joins for complex relational data structures

Everything in the document engine is stateless: you give it a template ID and a data file ID, and it gives you documents.

Key namespaces: /documents, /word-templates, /image-library, /sample-library, /domains

Layer 3: Distribution Layer

Generated documents need to get somewhere. In most enterprise workflows, that means email: individual emails with personalized attachments, bulk campaigns with engagement tracking, or packaged exports that feed into downstream systems. The distribution layer handles the full email lifecycle:

OAuth authentication against Microsoft's Graph API

Async campaign management with pause/resume controls

Pixel-based open tracking

Click tracking

Power Automate integration for reply capture

Key namespaces: /email-templates, /email-campaigns, /email/track, /email/auth, /email-publishing-exports

The AI Layer

A fourth element sits orthogonally across all three layers: the AI namespace (/claude-coop). Unlike most platforms that treat AI as a UI feature, Data Publisher exposes AI capabilities as API endpoints, meaning they can be invoked at any point in any workflow. This enables:

Natural language analysis of campaign performance

Template improvement suggestions

Contextual workflow assistance

The platform does not have one "mode" — it has three distinct stages, each with its own API surface. Understanding which stage your use case lives in is the prerequisite to knowing which endpoints to use.

Getting Data In: The Four Connectors

The data layer supports four distinct connector types. Each connector normalizes its source into the same internal format — a registered CSV file that can be addressed by ID. Once data enters the platform through any connector, the downstream generation and distribution layers operate identically regardless of source.

CSV Upload (`/csv`)

Direct file upload endpoint supporting multipart form-data (files up to 50MB) and JSON payloads (programmatic uploads). On upload, the platform automatically:

Parses file structure and identifies column types (string, number, date, email, URL, boolean)

Counts rows and detects relationships to other CSV files based on column name patterns

Returns rich metadata including columns array for template mapping

Paginated access via GET /api/csv/:id/data with limit/offset parameters plus server-side filtering (filterColumn/filterValue) keeps large datasets performant.

Google Sheets (`/google`)

Full OAuth2 connector for Google Sheets API v4. Three-step authorization flow:

GET /auth/start returns authorization URL

User completes OAuth consent in browser

Platform stores access/refresh tokens (AES-256 encrypted) and handles automatic refreshes

Key endpoints: List spreadsheets (GET /spreadsheets), read sheet data (GET /spreadsheets/:id/sheets/:name/data), import to CSV (POST /spreadsheets/:id/sheets/:name/import). First row always treated as headers; all data imported as strings.

Microsoft Excel (`/microsoft`)

Parallel OAuth2 connector for Microsoft Graph API (OneDrive/SharePoint). Requests three scopes: Files.Read.All, offline_access, User.Read. Architectural mirror of Google Sheets connector with transparent token refresh.

Uses OneDrive file IDs (not URLs) for workbook addressing. Returns both ID and webUrl for display. Supports .xlsx, .xlsm, .xlsb formats (not legacy .xls). Worksheet names are URL-encoded to handle spaces/special characters.

SQL Server (`/sql-server`)

Enterprise database connector for operational data sources. Supports connection strings or server/database/credentials configuration. Platform validates connections at setup time and enforces:

Security: AES-256 encrypted credentials, SELECT-only queries (INSERT/UPDATE/DELETE/CREATE/ALTER/DROP/EXEC rejected)

Validation: Query syntax checking before execution

Best Practice: Query views (not base tables) for data team control over exposed fields

Execution via POST /query (returns typed JSON) or POST /query/import (routes directly to CSV file).

Data Sync and Automation (`/sync`)

Scheduled sync layer for all four connectors. Configure frequency (hourly, daily, weekly) to automatically refresh target CSV files from source data. Sync history (GET /schedules/:id/history) records row counts, change deltas (added/updated/deleted), and error messages for debugging.

The Data Sources API (/api/data-sources) provides unified view across all connector types — single GET / returns all sources with type, status, last sync. Batch connection testing via POST /test-all for health monitoring dashboards.

Multi-Table Data (`/data-sets` and `/domains`)

Data Sets (/api/data-sets): Group multiple related CSV files with foreign key relationship definitions for complex documents requiring joins across tables.

Domains (/api/domains): Pre-configured vertical-specific Data Sets with relationships and sample data. Available domains: real estate (Properties, Agents, PropertyPhotos), e-commerce (Products, Categories, Specifications), healthcare, finance, HR, education. Clone to user account via POST /domains/:domainId/clone for instant setup.

The Document Engine

Two complementary APIs manage templates depending on integration context:

Documents API (`/documents`)

Primary template management layer handling:

Upload: POST /upload with multipart form-data

Variable Extraction: Asynchronous processing extracts all {{variable}} placeholders; status field progresses from processing to completed

Template Contract: variables array in response defines exact field names expected from data file (basis for field mapping UIs)

Structure Parsing: GET /api/documents/:id/content returns paragraph structure with variable occurrence counts (useful for previews and detecting breaking changes)

Word Templates API (`/word-templates`)

Adds OOXML capabilities for Office.js integration:

OOXML Extraction: GET /api/word-templates/:id/ooxml returns raw Office Open XML for insertion into open Word documents via Office.js (Word.run() + body.insertOoxml())

Save from Document: POST /from-document accepts OOXML from currently open Word document and stores as template

Use Case: Bridge between server template library and client-side task pane add-in experience (no file downloads or document replacements)

Image Library (`/image-library`)

Centralized visual asset management:

Group images by purpose (logos, product photos, charts, backgrounds)

Each upload receives permanent URL and dimension metadata

Reference by URL in templates; generation engine resolves at document creation

Authenticated endpoints for template management, direct serving for email distribution

Sample Library (`/sample-library`)

Onboarding acceleration through pre-built templates:

Browsable catalog of vertical-specific templates with paired sample data

POST /samples/:id/copy clones to user account

Demonstrates expected data structure and reduces time-to-first-document

The Distribution Layer

The distribution layer manages the complete email workflow from authentication through engagement analytics.

Email Authentication (`/email/auth`)

All email sending routes through Microsoft Graph API. OAuth2 flow requests three scopes:

Mail.Send: Required for sending

offline_access: Token refresh (90-day validity)

User.Read: Sending account identification

Token management is transparent — platform automatically refreshes expired access tokens (1-hour lifetime) before send operations. Developer code never touches token state.

Email Templates (`/email-templates`)

Accept Word documents and convert to email-optimized HTML:

Full formatting preservation (headings, tables, inline images, colors, alignment)

Multi-client compatibility (Outlook, Gmail, Apple Mail, mobile)

htmlContent field in response provides exact rendered output for previews

Email Campaigns (`/email-campaigns`)

11-endpoint lifecycle for bulk sending with personalization:

Draft Creation: POST / configures campaign (data file, email template, recipient field, subject line, attachment settings, tracking preferences, test mode)

Updates: Modify configuration while in draft status

Send Initiation: POST /:id/send starts asynchronous processing (returns jobId and estimated duration)

Progress Monitoring: GET /:id/status returns progress %, sent/failed/remaining counts, per-recipient errors (poll every 10-30 seconds; ~30 emails/min send rate due to Graph throttling)

Pause/Resume: POST /:id/pause and POST /:id/resume for operational safety (correct data mid-campaign without losing progress)

Cancellation: POST /:id/cancel stops execution permanently

Analytics: GET /:id/analytics returns aggregate engagement metrics post-completion

Personalized Attachments: When attachmentTemplateId + generateAttachments: true specified, platform generates custom PDF/DOCX for each recipient using their row data (500 recipients → 500 unique attached documents).

Email Tracking (`/email/track`)

Three public endpoints (no authentication, <10ms response time):

Open Tracking: GET /open/:trackingId returns 1×1 transparent GIF (async DB update after response)

Click Tracking: GET /click/:trackingId?redirect=URL records click and redirects (async update)

Reply Webhook: POST /reply/:trackingId integrates with Power Automate flows monitoring Outlook inbox

Reply Attachments: POST /reply-attachment/:trackingId stores uploaded documents from respondents (contract collection workflows)

Authenticated tracking endpoints:

Individual Status: GET /status/:trackingId returns open count, click count, timestamps, IP addresses, user agents

Campaign Analytics: GET /campaign/:campaignId returns aggregate stats, top-links breakdown, engagement timeline

Email Publishing Exports (`/email-publishing-exports`)

Alternative distribution for non-email channels (client portals, print vendors, downstream systems):

Workflow: Create configuration (POST /), execute generation (POST /:id/execute), poll status (GET /:id/status), download ZIP (GET /:id/download).

ZIP Structure: Generated documents at root with consistent naming, static attachments in subdirectory (150 records → 150 individually named files in single archive).

The AI Layer (`/claude-coop`)

Unlike platforms that treat AI as UI-only features (chat windows, help buttons), Data Publisher exposes AI capabilities as API endpoints — invokable at any point in any workflow.

Two endpoints:

POST /analyze: Accepts campaign analytics payloads and returns natural language interpretation of open rates, click patterns, engagement trends

POST /suggest: Accepts template HTML content and returns actionable recommendations contextually aware of platform variable syntax and email client compatibility

Integration Opportunities:

"Analyze this campaign" button posts GET /:id/analytics data to /analyze for human-readable insights

"Suggest improvements" feature posts template content to /suggest for specific optimization recommendations

Intake assistants that convert client descriptions into structured template outlines with recommended variables

AI participates as a collaborative workflow layer, not a bolt-on feature — intelligence native to the document automation domain.

Practical Example: Property Listing Generator

The following demonstrates a complete real estate integration using five API namespaces to create a branded property document generator with automated data refresh and engagement tracking.

Step 1: Authentication and User Provisioning

Registration Flow (email-gated):

POST /api/auth/request-code → sends 6-digit code to email

POST /api/auth/register → exchanges code for JWT (7-day expiry)

Store JWT securely; refresh via POST /api/auth/login before expiry

Automated Provisioning: Call request-code and register programmatically for bulk user setup. 14-day trial activates automatically on registration.

Step 2: Domain Setup

Clone pre-configured real estate data structure:

POST /api/domains/real-estate/clone
{ "includeSampleData": true }

Result: Receives IDs for:

3 CSV files (Properties, Agents, PropertyPhotos) with sample data

2 Word templates (Property Brochure, Property Listing Sheet)

Auto-join configuration defining table relationships

Replace sample data with real property records to make templates operational.

Step 3: Connect Live Data Source

Connect daily-updated Google Sheet:

GET /api/google/auth/start → get OAuth URL

Complete OAuth flow in browser

POST /api/sync/schedules → configure daily sync at 6 AM targeting Properties CSV

Properties data auto-refreshes every morning without manual exports.

Step 4: Generate Property Brochures

Create export configuration for PDF generation:

POST /api/email-publishing-exports { "csvId": , "templateId": , "format": "pdf", "emailField": "AgentEmail" } POST /:id/execute // Start generation GET /:id/status // Poll until "completed"

GET /:id/download // Download ZIP

Result: ZIP contains one PDF per property with data from joined tables (Properties + Agents + PropertyPhotos).

Step 5: Run Email Campaign

Create and execute campaign with personalized attachments:

POST /api/email-campaigns { "dataFileId": , "templateId": , "emailField": "ProspectEmail", "attachmentTemplateId": , "trackOpens": true, "trackClicks": true } POST /:id/send // Initiate (async) GET /:id/status // Monitor progress

GET /:id/analytics // Post-completion engagement metrics

Reply tracking via Power Automate captures prospect responses and stores attachments (signed offers) retrievable through tracking API.

Complete Workflow

Data In (Google Sheets via sync) → Document Engine (Property Brochure + Data Set joins) → Distribution (email campaign with attachment generation and tracking)

Five API namespaces. One coherent workflow.

White-Label Integration Architecture

The platform's API layer supports complete white-labeling:

Multi-Tenancy and Data Isolation

Each user has isolated Data Publisher account (registered through your provisioning flow)

Authentication via JWTs managed by your application

Complete data isolation: templates, CSV files, campaigns, images, sync schedules

Users never see Data Publisher branding, dashboard, or pricing

Vertical-Specific Integration

Control what capabilities your interface exposes:

Selective API Surface: Homeschool coordinator? Omit SQL Server connector. Legal firm without logos? Skip image library.

Domain-Based Onboarding: Clone relevant domain on first login for pre-populated workspace

Sample Library: Pre-populate sample templates from library for instant demonstration

Time-to-First-Document: Minutes vs hours through structured onboarding

Competitive Differentiation

/claude-coop endpoints provide AI capabilities most vertical automation solutions can't match:

"Analyze my results" returns meaningful campaign performance interpretation

Template improvement assistant understands platform syntax and formatting model

Natural language features without building ML infrastructure

Getting Started

Recommended Approach

Understand Three-Layer Architecture: Data → Engine → Distribution (determines which namespaces matter)

Start with Minimum Viable API Surface: One data connector + one template type + one distribution method

Prioritize Early:

- Data Sources API (/data-sources) + Data Sync (/sync): Define how fresh data is

- Campaign Lifecycle Endpoints: Async send pattern requires interface design thinking different from synchronous APIs

- /claude-coop Prototyping: Demonstrating AI capabilities changes prospects' expectations

Next Steps

Authentication & Users: Review 01-Authentication.md for registration flows and JWT lifecycle

Data Integration: Start with 03-Data-Files.md (CSV) or connector-specific docs (20-Google-Sheets.md, 21-Microsoft-Excel.md, 22-SQL-Server.md)

Document Generation: See 04-Documents.md and 30-Word-Templates.md

Email Distribution: Begin with 10-Email-Templates.md and 11-Email-Campaigns.md

Quick Reference: Consult API-Quick-Reference.md for all 123 endpoints at a glance

---

API Quick Reference

| Layer | Key Namespaces | What It Does |

|-------|---------------|--------------|

| Data Layer | /csv, /google, /microsoft, /sql-server, /data-sources, /sync, /data-sets | Connects data from any source; normalizes to CSV for the engine |

| Document Engine | /documents, /word-templates, /image-library, /sample-library, /domains | Converts templates + data into generated documents |

| Distribution | /email-templates, /email-campaigns, /email/track, /email/auth, /email-publishing-exports | Sends, tracks, and exports generated documents |

| AI Layer | /claude-coop | Analyze results and suggest improvements via natural language |

| Platform | /auth, /users | Account management, JWT lifecycle, subscription status |