Back to API Reference

Data Publisher API - Complete Guide

Introduction

Welcome to the Data Publisher API documentation. This guide is written for developers who need to build serious integrations with the Data Publisher platform — whether you're creating a custom client interface, building workflow automation, or developing a vertical-specific application.

The Data Publisher v2.0 API exposes 123 endpoints across 23 namespaces. More importantly, it exposes the complete operational logic of the platform in three discrete layers: data comes in, it moves through a document engine, and the output goes out through a distribution layer. Understanding that layering is the first step toward building integrations that actually work in production environments.

Base URL: https://app.datapublisher.io/api

Development URL: http://localhost:3001/api

Authentication: JWT Bearer tokens

Data Format: JSON (with multipart/form-data for file uploads)

The Three-Layer Architecture

Before writing a single line of integration code, spend time understanding how the platform is organized. The namespaces are not arbitrary — they reflect a deliberate separation of concerns that maps directly onto how document automation workflows actually operate in practice.

Layer 1: Data Layer

Every document automation workflow begins with data that lives somewhere: a spreadsheet maintained in Google Sheets, a customer database on a SQL Server instance, an Excel file on SharePoint, or a CSV exported from a legacy system. The data layer's job is to accept all of those sources and normalize them into a consistent internal format that the document engine can consume without needing to know where the data came from.

Key namespaces: /csv, /google, /microsoft, /sql-server, /data-sources, /sync, /data-sets

Layer 2: Document Engine

This is the core of what Data Publisher does: it takes a Word template with {{variable}} placeholders and a data file, and produces populated documents. The engine handles:

Everything in the document engine is stateless: you give it a template ID and a data file ID, and it gives you documents.

Key namespaces: /documents, /word-templates, /image-library, /sample-library, /domains

Layer 3: Distribution Layer

Generated documents need to get somewhere. In most enterprise workflows, that means email: individual emails with personalized attachments, bulk campaigns with engagement tracking, or packaged exports that feed into downstream systems. The distribution layer handles the full email lifecycle:

Key namespaces: /email-templates, /email-campaigns, /email/track, /email/auth, /email-publishing-exports

The AI Layer

A fourth element sits orthogonally across all three layers: the AI namespace (/claude-coop). Unlike most platforms that treat AI as a UI feature, Data Publisher exposes AI capabilities as API endpoints, meaning they can be invoked at any point in any workflow. This enables:

The platform does not have one "mode" — it has three distinct stages, each with its own API surface. Understanding which stage your use case lives in is the prerequisite to knowing which endpoints to use.

Getting Data In: The Four Connectors

The data layer supports four distinct connector types. Each connector normalizes its source into the same internal format — a registered CSV file that can be addressed by ID. Once data enters the platform through any connector, the downstream generation and distribution layers operate identically regardless of source.

CSV Upload (/csv)

Direct file upload endpoint supporting multipart form-data (files up to 50MB) and JSON payloads (programmatic uploads). On upload, the platform automatically:

Paginated access via GET /api/csv/:id/data with limit/offset parameters plus server-side filtering (filterColumn/filterValue) keeps large datasets performant.

Google Sheets (/google)

Full OAuth2 connector for Google Sheets API v4. Three-step authorization flow:

  • GET /auth/start returns authorization URL
  • User completes OAuth consent in browser
  • Platform stores access/refresh tokens (AES-256 encrypted) and handles automatic refreshes
  • Key endpoints: List spreadsheets (GET /spreadsheets), read sheet data (GET /spreadsheets/:id/sheets/:name/data), import to CSV (POST /spreadsheets/:id/sheets/:name/import). First row always treated as headers; all data imported as strings.

    Microsoft Excel (/microsoft)

    Parallel OAuth2 connector for Microsoft Graph API (OneDrive/SharePoint). Requests three scopes: Files.Read.All, offline_access, User.Read. Architectural mirror of Google Sheets connector with transparent token refresh.

    Uses OneDrive file IDs (not URLs) for workbook addressing. Returns both ID and webUrl for display. Supports .xlsx, .xlsm, .xlsb formats (not legacy .xls). Worksheet names are URL-encoded to handle spaces/special characters.

    SQL Server (/sql-server)

    Enterprise database connector for operational data sources. Supports connection strings or server/database/credentials configuration. Platform validates connections at setup time and enforces:

    Execution via POST /query (returns typed JSON) or POST /query/import (routes directly to CSV file).

    Data Sync and Automation (/sync)

    Scheduled sync layer for all four connectors. Configure frequency (hourly, daily, weekly) to automatically refresh target CSV files from source data. Sync history (GET /schedules/:id/history) records row counts, change deltas (added/updated/deleted), and error messages for debugging.

    The Data Sources API (/api/data-sources) provides unified view across all connector types — single GET / returns all sources with type, status, last sync. Batch connection testing via POST /test-all for health monitoring dashboards.

    Multi-Table Data (/data-sets and /domains)

    Data Sets (/api/data-sets): Group multiple related CSV files with foreign key relationship definitions for complex documents requiring joins across tables.

    Domains (/api/domains): Pre-configured vertical-specific Data Sets with relationships and sample data. Available domains: real estate (Properties, Agents, PropertyPhotos), e-commerce (Products, Categories, Specifications), healthcare, finance, HR, education. Clone to user account via POST /domains/:domainId/clone for instant setup.

    The Document Engine

    Two complementary APIs manage templates depending on integration context:

    Documents API (/documents)

    Primary template management layer handling:

    Word Templates API (/word-templates)

    Adds OOXML capabilities for Office.js integration:

    Image Library (/image-library)

    Centralized visual asset management:

    Sample Library (/sample-library)

    Onboarding acceleration through pre-built templates:

    The Distribution Layer

    The distribution layer manages the complete email workflow from authentication through engagement analytics.

    Email Authentication (/email/auth)

    All email sending routes through Microsoft Graph API. OAuth2 flow requests three scopes:

    Token management is transparent — platform automatically refreshes expired access tokens (1-hour lifetime) before send operations. Developer code never touches token state.

    Email Templates (/email-templates)

    Accept Word documents and convert to email-optimized HTML:

    Email Campaigns (/email-campaigns)

    11-endpoint lifecycle for bulk sending with personalization:

  • Draft Creation: POST / configures campaign (data file, email template, recipient field, subject line, attachment settings, tracking preferences, test mode)
  • Updates: Modify configuration while in draft status
  • Send Initiation: POST /:id/send starts asynchronous processing (returns jobId and estimated duration)
  • Progress Monitoring: GET /:id/status returns progress %, sent/failed/remaining counts, per-recipient errors (poll every 10-30 seconds; ~30 emails/min send rate due to Graph throttling)
  • Pause/Resume: POST /:id/pause and POST /:id/resume for operational safety (correct data mid-campaign without losing progress)
  • Cancellation: POST /:id/cancel stops execution permanently
  • Analytics: GET /:id/analytics returns aggregate engagement metrics post-completion
  • Personalized Attachments: When attachmentTemplateId + generateAttachments: true specified, platform generates custom PDF/DOCX for each recipient using their row data (500 recipients → 500 unique attached documents).

    Email Tracking (/email/track)

    Three public endpoints (no authentication, <10ms response time):

    Authenticated tracking endpoints:

    Email Publishing Exports (/email-publishing-exports)

    Alternative distribution for non-email channels (client portals, print vendors, downstream systems):

    Workflow: Create configuration (POST /), execute generation (POST /:id/execute), poll status (GET /:id/status), download ZIP (GET /:id/download).

    ZIP Structure: Generated documents at root with consistent naming, static attachments in subdirectory (150 records → 150 individually named files in single archive).

    The AI Layer (/claude-coop)

    Unlike platforms that treat AI as UI-only features (chat windows, help buttons), Data Publisher exposes AI capabilities as API endpoints — invokable at any point in any workflow.

    Two endpoints:

    Integration Opportunities:

    AI participates as a collaborative workflow layer, not a bolt-on feature — intelligence native to the document automation domain.

    Practical Example: Property Listing Generator

    The following demonstrates a complete real estate integration using five API namespaces to create a branded property document generator with automated data refresh and engagement tracking.

    Step 1: Authentication and User Provisioning

    Registration Flow (email-gated):

  • POST /api/auth/request-code → sends 6-digit code to email
  • POST /api/auth/register → exchanges code for JWT (7-day expiry)
  • Store JWT securely; refresh via POST /api/auth/login before expiry
  • Automated Provisioning: Call request-code and register programmatically for bulk user setup. 14-day trial activates automatically on registration.

    Step 2: Domain Setup

    Clone pre-configured real estate data structure:

    POST /api/domains/real-estate/clone
    

    { "includeSampleData": true }

    Result: Receives IDs for:

    Replace sample data with real property records to make templates operational.

    Step 3: Connect Live Data Source

    Connect daily-updated Google Sheet:

  • GET /api/google/auth/start → get OAuth URL
  • Complete OAuth flow in browser
  • POST /api/sync/schedules → configure daily sync at 6 AM targeting Properties CSV
  • Properties data auto-refreshes every morning without manual exports.

    Step 4: Generate Property Brochures

    Create export configuration for PDF generation:

    POST /api/email-publishing-exports
    

    {

    "csvId": ,

    "templateId": ,

    "format": "pdf",

    "emailField": "AgentEmail"

    }

    POST /:id/execute // Start generation

    GET /:id/status // Poll until "completed"

    GET /:id/download // Download ZIP

    Result: ZIP contains one PDF per property with data from joined tables (Properties + Agents + PropertyPhotos).

    Step 5: Run Email Campaign

    Create and execute campaign with personalized attachments:

    POST /api/email-campaigns
    

    {

    "dataFileId": ,

    "templateId": ,

    "emailField": "ProspectEmail",

    "attachmentTemplateId": ,

    "trackOpens": true,

    "trackClicks": true

    }

    POST /:id/send // Initiate (async)

    GET /:id/status // Monitor progress

    GET /:id/analytics // Post-completion engagement metrics

    Reply tracking via Power Automate captures prospect responses and stores attachments (signed offers) retrievable through tracking API.

    Complete Workflow

    Data In (Google Sheets via sync) → Document Engine (Property Brochure + Data Set joins) → Distribution (email campaign with attachment generation and tracking)

    Five API namespaces. One coherent workflow.

    White-Label Integration Architecture

    The platform's API layer supports complete white-labeling:

    Multi-Tenancy and Data Isolation

    Vertical-Specific Integration

    Control what capabilities your interface exposes:

    Competitive Differentiation

    /claude-coop endpoints provide AI capabilities most vertical automation solutions can't match:

    Getting Started

    Recommended Approach

  • Understand Three-Layer Architecture: Data → Engine → Distribution (determines which namespaces matter)
  • Start with Minimum Viable API Surface: One data connector + one template type + one distribution method
  • Prioritize Early:
  • - Data Sources API (/data-sources) + Data Sync (/sync): Define how fresh data is

    - Campaign Lifecycle Endpoints: Async send pattern requires interface design thinking different from synchronous APIs

    - /claude-coop Prototyping: Demonstrating AI capabilities changes prospects' expectations

    Next Steps

    ---

    API Quick Reference

    | Layer | Key Namespaces | What It Does |

    |-------|---------------|--------------|

    | Data Layer | /csv, /google, /microsoft, /sql-server, /data-sources, /sync, /data-sets | Connects data from any source; normalizes to CSV for the engine |

    | Document Engine | /documents, /word-templates, /image-library, /sample-library, /domains | Converts templates + data into generated documents |

    | Distribution | /email-templates, /email-campaigns, /email/track, /email/auth, /email-publishing-exports | Sends, tracks, and exports generated documents |

    | AI Layer | /claude-coop | Analyze results and suggest improvements via natural language |

    | Platform | /auth, /users | Account management, JWT lifecycle, subscription status |