Orchestrating Complex Enterprise Data Syncs with n8n

Orchestrating Complex Enterprise Data Syncs with n8n

August 8, 2025 · 11 min read · by Muhammad Amal programming

TL;DR — Enterprise data syncs in n8n need three patterns most demos skip. Idempotency keys to survive retries. Watermarks to bound the pull. A dead letter table to keep poison messages out of the main flow. Get those right and the rest is just node wiring.

I’ve built data syncs in n8n connecting Salesforce to a billing system, Jira to a data warehouse, and a half-dozen variations on “the marketing tool needs the customer data ten minutes ago.” The first version always looks clean. The third version, after the first production incident and the first poison message, looks like real engineering. This article skips the first two versions.

We’ll build a single canonical sync, pulling records from a source system, transforming, and upserting to a destination. We’ll add the three patterns that make it survive. Then we’ll talk about fan-out to multiple destinations, the dead letter queue, and the observability hooks you need. The architecture assumes a queue-mode cluster like the one in the advanced n8n architecture article.

Opinions stated plainly. Cron-triggered full-table syncs are the wrong default. Watermarks are the right default. Idempotency is non-negotiable, because retries happen whether you plan for them or not. And if you don’t have a dead letter table, you have a system that pages on-call every time a single row has bad data.

1. The Canonical Sync Shape

Every sync I write has the same skeleton. Trigger, fetch since watermark, transform, upsert, advance watermark, emit metrics. The differences are in what each step does, not the order.

+----------+   +----------+   +-----------+   +----------+   +-----------+
| Schedule |-->|  Fetch   |-->| Transform |-->|  Upsert  |-->|  Advance  |
|  Cron    |   |  since   |   |  to dest  |   |  to dest |   | Watermark |
+----------+   |  WM      |   |  schema   |   |  by key  |   +-----------+
               +----------+   +-----------+   +----+-----+         |
                                                   |               |
                                            on error               |
                                                   v               v
                                            +-------------+   +---------+
                                            | Dead letter |   | Metrics |
                                            +-------------+   +---------+

The watermark is the timestamp (or monotonic ID) of the last record you successfully processed. You store it durably, you read it at the start of each run, you advance it only after the destination has accepted the batch. This is the difference between “sync that catches up” and “sync that loses data when it crashes.”

I store watermarks in a small Postgres table dedicated to the sync system, separate from n8n’s own state.

CREATE TABLE sync_state (
  sync_name   TEXT PRIMARY KEY,
  watermark   TIMESTAMPTZ NOT NULL,
  last_run_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  records_in  BIGINT NOT NULL DEFAULT 0,
  records_out BIGINT NOT NULL DEFAULT 0,
  errors      BIGINT NOT NULL DEFAULT 0
);

2. Watermark-Based Pull

The watermark pattern looks trivial. It isn’t, because of two edge cases. Clock skew between systems, and rows that arrive out of order relative to their updated_at. The fix for both is to subtract a safety window from the watermark on each read.

The workflow has a Postgres node to read the watermark, an HTTP Request node to fetch from the source, and a Set node to compute the safety-adjusted window.

{
  "name": "Read watermark",
  "type": "n8n-nodes-base.postgres",
  "parameters": {
    "operation": "executeQuery",
    "query": "SELECT watermark FROM sync_state WHERE sync_name = 'salesforce_to_warehouse'"
  },
  "credentials": { "postgres": { "id": "1", "name": "warehouse-pg" } }
}

{
  "name": "Compute window",
  "type": "n8n-nodes-base.set",
  "parameters": {
    "values": {
      "string": [
        {
          "name": "since",
          "value": "={{ $json.watermark ? new Date(new Date($json.watermark).getTime() - 5 * 60 * 1000).toISOString() : '1970-01-01T00:00:00Z' }}"
        },
        {
          "name": "until",
          "value": "={{ new Date(Date.now() - 60 * 1000).toISOString() }}"
        }
      ]
    }
  }
}

since is the watermark minus five minutes, the safety window. until is now minus one minute, to give the source system a moment to write any records that are in flight. Both choices are workload-specific. For Salesforce I use a five-minute safety window. For a database CDC source, thirty seconds is plenty.

The fetch node uses both values.

{
  "name": "Fetch from source",
  "type": "n8n-nodes-base.httpRequest",
  "parameters": {
    "url": "https://source.acme.internal/v1/accounts",
    "method": "GET",
    "qs": {
      "updated_after": "={{ $('Compute window').item.json.since }}",
      "updated_before": "={{ $('Compute window').item.json.until }}",
      "limit": 500
    },
    "pagination": {
      "paginationMode": "responseContainsNextURL",
      "nextURL": "={{ $response.body.next_url }}",
      "maxRequests": 200
    }
  }
}

The 200-page cap on pagination is a circuit breaker. If your source returns more than 100K records in a sync run, you have a different problem (backfill, not sync), and you want the workflow to fail fast rather than run for an hour.

3. Idempotency at the Destination

The destination upsert must be idempotent on the source’s primary key. n8n retries failed jobs, and you cannot assume a row was upserted just because the worker started writing it. The right pattern is INSERT ... ON CONFLICT ... DO UPDATE with a strict version check.

{
  "name": "Upsert account",
  "type": "n8n-nodes-base.postgres",
  "parameters": {
    "operation": "executeQuery",
    "query": "INSERT INTO accounts (source_id, name, updated_at, source_version) VALUES ($1, $2, $3, $4) ON CONFLICT (source_id) DO UPDATE SET name = EXCLUDED.name, updated_at = EXCLUDED.updated_at, source_version = EXCLUDED.source_version WHERE accounts.source_version < EXCLUDED.source_version",
    "additionalFields": {
      "queryParameters": "={{ [$json.id, $json.name, $json.updated_at, $json.version] }}"
    }
  }
}

Two details matter. The WHERE accounts.source_version < EXCLUDED.source_version clause prevents stale data from overwriting newer data when the same record gets reprocessed out of order. The query parameter binding uses $1, $2, etc., not string interpolation, so a ' in a name doesn’t break the SQL.

For destinations without conditional upserts (most REST APIs), use an Idempotency-Key header derived from a hash of the source record.

// in a Code node
const crypto = require('crypto');
const items = $input.all();

return items.map(item => {
  const key = crypto
    .createHash('sha256')
    .update(`${item.json.id}:${item.json.version}`)
    .digest('hex')
    .slice(0, 32);
  return { json: { ...item.json, idempotencyKey: key } };
});

The HTTP Request node then sends Idempotency-Key: {{ $json.idempotencyKey }}. Most modern APIs (Stripe, Square, internal platforms built in the last few years) honor it.

4. Advancing the Watermark Safely

This is the step everyone gets wrong on the first try. The naive approach is to advance the watermark to “now” at the end of the run. That’s wrong because if the destination upsert fails halfway through the batch, you’ve skipped records.

The correct approach is to advance to the updated_at of the last successfully processed record, only after that record’s upsert has succeeded.

{
  "name": "Advance watermark",
  "type": "n8n-nodes-base.postgres",
  "parameters": {
    "operation": "executeQuery",
    "query": "UPDATE sync_state SET watermark = $1, last_run_at = NOW(), records_in = records_in + $2, records_out = records_out + $3 WHERE sync_name = 'salesforce_to_warehouse'",
    "additionalFields": {
      "queryParameters": "={{ [$json.last_updated_at, $json.records_in, $json.records_out] }}"
    }
  }
}

If you batch upserts, advance the watermark to the highest updated_at of the rows that succeeded in the current batch. If the batch fails partway, do not advance, and let the next run pick up from the same point. The safety window means a few rows will be reprocessed, and the idempotent upsert means that’s harmless.

Why not just track a list of processed IDs

A few teams do this. It works at low volume. At high volume the “processed IDs” set becomes a million-row table that gets queried on every upsert. The watermark-plus-idempotency pattern scales to millions of records per day with no extra storage.

5. Fan-Out, Fan-In

Real syncs often write to multiple destinations. The warehouse, the CRM, and a search index, all from the same source pull. n8n handles this with multiple output connections from a Split In Batches node, but the gotcha is error handling. If the warehouse write succeeds and the CRM write fails, what’s the state of the system?

The pattern I use is to write to a staging table first, then fan out to destinations, then mark the staging row done only when all destinations have acked.

                              +---------------+
                              |   Source API  |
                              +-------+-------+
                                      |
                              +-------v-------+
                              |   Staging     |
                              |   table       |
                              +---+---+---+---+
                                  |   |   |
                          +-------+   |   +-------+
                          |           |           |
                  +-------v---+ +-----v----+ +----v------+
                  | Warehouse | |   CRM    | |  Search   |
                  +-------+---+ +-----+----+ +----+------+
                          |           |           |
                          +-------+   |   +-------+
                                  |   |   |
                              +---v---v---v---+
                              |   Mark done   |
                              +---------------+

Each destination’s success writes to a sync_dest_state table.

CREATE TABLE sync_dest_state (
  source_id   TEXT NOT NULL,
  destination TEXT NOT NULL,
  status      TEXT NOT NULL CHECK (status IN ('pending','done','failed')),
  attempts    INT NOT NULL DEFAULT 0,
  last_error  TEXT,
  updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (source_id, destination)
);

A separate workflow (or a final step in the main flow) reaps pending rows and retries them. Failed destinations don’t block successful ones. The system is partial-failure-tolerant by construction.

6. Dead Letter Queue

Some records will never sync. Bad data, schema mismatches, business rules the source didn’t apply. They must not block the rest of the sync. Route them to a dead letter table.

The pattern is an error trigger workflow. n8n’s “Error Trigger” node fires on any execution that fails. It has access to the failed execution’s input data, the node that failed, and the error message.

{
  "name": "Error Trigger",
  "type": "n8n-nodes-base.errorTrigger"
}

{
  "name": "Write to DLQ",
  "type": "n8n-nodes-base.postgres",
  "parameters": {
    "operation": "executeQuery",
    "query": "INSERT INTO sync_dead_letter (workflow, execution_id, node_name, error_message, payload, created_at) VALUES ($1, $2, $3, $4, $5::jsonb, NOW())",
    "additionalFields": {
      "queryParameters": "={{ [$json.workflow.name, $json.execution.id, $json.execution.lastNodeExecuted, $json.execution.error.message, JSON.stringify($json.execution.error.itemData || {})] }}"
    }
  }
}

The dead letter table is then surfaced in a dashboard, reviewed weekly, and either fixed at the source or skipped explicitly. The principle: any sync failure is either a known dead-letter case or a real incident. There’s no “transient flap” middle ground after retries are exhausted.

For richer error handling patterns including retry strategies, see the error handling and retries article.

7. Observability

The metrics that matter for a data sync are not generic n8n metrics. They’re workflow-specific.

Records pulled per run.
Records upserted per run.
Watermark lag, the difference between now and the current watermark.
Dead letter row count.
Per-destination success rate.

Emit these from the workflow itself to Prometheus via the n8n metrics endpoint, or to a metrics gateway via HTTP. I use a custom node that wraps prom-client and exposes a single Emit Metric operation, but a plain HTTP Request to a pushgateway works.

The watermark lag is the single most useful metric. If it’s growing, the sync isn’t keeping up. Alert on lag exceeding 30 minutes for a five-minute schedule. The official n8n docs on schedule triggers cover the cron syntax options if you need finer control than */5 * * * *.

Common Pitfalls

Four traps I’ve seen ruin syncs.

Using now() as the watermark instead of the record’s updated_at. The sync starts dropping records as soon as the source has a write spike. By the time anyone notices, you have an unknown number of missing rows. Always use the source record’s own timestamp.

No safety window on the watermark read. Clock skew between the source system and your sync host eventually causes a record to be written with an updated_at slightly in the past relative to the watermark, and the next pull misses it. Subtract at least one source-system clock tick (usually a minute) from every watermark read.

Putting credentials in the workflow JSON. I’ve seen API keys committed to git in exported workflows. Use n8n credentials, reference them by name, and export workflows with n8n export:workflow --all which strips credentials.

Catch-all error handling that swallows real bugs. A try/catch around the entire workflow that writes everything to the DLQ means a code bug looks like a data problem. Catch narrowly. Let unknown errors surface.

Troubleshooting

Three failure modes that show up in practice.

Sync is running but no data is moving. The watermark hasn’t advanced. Check SELECT * FROM sync_state to confirm. If the watermark is current but the source has new data, the source filter is wrong. Log the actual since and until values being sent to the source and compare to records you know exist.

Records appear twice in the destination. The upsert isn’t actually idempotent. Either the conflict target isn’t the right column, or the source ID isn’t unique. Run SELECT source_id, COUNT(*) FROM accounts GROUP BY source_id HAVING COUNT(*) > 1 to confirm. Add a unique index on the conflict target column.

Workflow hits the n8n execution timeout. Default is unbounded but some hosts cap it at one hour. Break the sync into smaller batches with a Split In Batches node and a sub-workflow per batch. Set EXECUTIONS_TIMEOUT=3600 explicitly so you know what the limit is.

Wrapping Up

Enterprise data syncs in n8n are a solved problem if you commit to the three patterns. Watermarks bound the work. Idempotency makes retries safe. Dead letter queues keep poison messages from blocking the line. Layer fan-out staging on top when you have multiple destinations, and emit workflow-specific metrics from day one.

The next article applies these patterns to a Jira Cloud integration with an internal developer platform, including custom node code and the auth dance. After that we’ll get into the queue mode setup that makes all of this performant at production scale.

1. The Canonical Sync Shape

2. Watermark-Based Pull

3. Idempotency at the Destination

4. Advancing the Watermark Safely

Why not just track a list of processed IDs

5. Fan-Out, Fan-In

6. Dead Letter Queue

7. Observability

Common Pitfalls

Troubleshooting

Wrapping Up

Related posts

Observability for n8n in 2025, Metrics, Logs, and Traces

Self Hosted n8n on Kubernetes, A Production Setup

Error Handling and Retries for Production n8n Workflows

Managing Secrets and Credentials in n8n for Enterprise

n8n Queue Mode with Redis at Scale, A Production Walkthrough

Connecting Jira Cloud to Internal Platforms with n8n

Writing Custom n8n Nodes in TypeScript, A Step by Step Tutorial

Advanced n8n Architecture in 2025, Queue Mode and Worker Scaling

Let’s Start a Project