Shopify at scale: sync, reconciliation and catalog operations without drift

by Federico March 12, 2026 5 min read

shopify
automation
ecommerce

The more you grow, the more Shopify stops being “the store” and becomes “one of the systems”. Alongside the store there’s an ERP, a PIM, a warehouse on an Excel sheet, a paper note moving between production and logistics, maybe a second Shopify for the international market. Every system wants to be the source of truth for something: prices, stock, orders, publication status. When they disagree, somebody between customer care, accounting and logistics spends the afternoon reconciling by hand.

This post collects the patterns we use to keep a Shopify catalog aligned with a custom ERP at 600+ SKU volume, in a B2C retail context (luxury resale, but the pattern applies to electronics, home & living, sporting goods). Three themes: 2-way sync with backoff, periodic reconciliation, safe bulk operations.

2-way sync: 7 entities, one golden rule

The starting point is figuring out who’s source of truth for what. In our case:

Products: ERP is truth (because that’s where pricing, sourcing and consignment logic lives), Shopify receives.
Inventory: ERP is truth, but Shopify carries “delta” from live orders → periodic reconciliation.
Customers: Shopify is truth for those arriving via storefront, ERP for B2B.
Orders: Shopify is truth (it’s the system where the order is born), ERP receives.
Transactions / payouts / balance: Shopify Payments is truth, ERP receives for accounting.

Seven edge functions run every 15 minutes, each on a single entity. Common pattern:

Cursor-based pull: we track the last processed updated_at to avoid pointless full scans.
Rate limit awareness: exponential backoff on Shopify 429s, max 3 retries, dead-letter on exhaustion.
Idempotency key: every processed record has a hash; if it comes in identical, skip.
Webhook integration: parallel to polling, a webhook handler ingests real-time events for orders/create, inventory_levels/update, products/update.

Webhook + polling sounds redundant, but it’s the pattern that delivered the most reliability. Webhooks sometimes drop (Shopify maintenance, network blip, retries exhausted); polling fills the gaps without intervention.

Webhook handler with dynamic register

A technical note that makes the difference over time: webhook registration is handled by a dedicated edge function, not the admin UI. Every time we change domain (test → staging → prod) or add a topic, the register-webhooks script realigns the Shopify configuration with the list that lives in code. No more “the webhook in prod was still pointing to the old domain”.

On the handler side, every webhook gets verified (HMAC), persisted in a webhook_inbox table, and processed by a separate worker. This separates “received in time” (important for Shopify) from “processed correctly” (which can retry).

Monthly reconciliation: the silent netting

Even with 15-minute sync, after 30 days there’s drift. Reasons vary: orders cancelled after sync, manual inventory adjustments done in Shopify, products unpublished and re-published with different SKUs. The monthly reconciliation is when we settle the books.

The script runs on the first night of the month and does three things:

Inventory reconciliation: for each SKU, compares Shopify quantity vs ERP. If different by > threshold (5 units or 10%), it generates a review log. If different but reconstructible from known orders, it auto-fixes.
Status reconciliation: for each listing, it checks published_at, status, publication scope. Common drifts (product active in ERP but archived in Shopify) land in a review queue.
Refund/payout backfill: scans the last month’s orders and verifies that payouts and refunds are correctly matched. 100%-discount orders (gift card, comp) have a dedicated handler for matching with Shopify payouts.

Concrete example: in a typical month on 600 SKUs the job detects drift on 15-20 listings, of which 12-15 are auto-fixed and 3-5 go into manual review. The time to close the review is about 30 minutes, against the 4-6 hours the same check used to take manually.

Bulk operations: compare-at, product merges, catalog push

One-shot operations on the catalog are the highest-risk moment. Changing compareAtPrice on 600 SKUs with a manual CSV upload is the best way to end up with 30 wrong products and no log of what happened.

Pattern for bulk operations:

Dry-run by default: the script always runs in preview mode first. Output: list of rows that will change, with before/after values.
Explicit confirmation: only after human review of the CSV diff does the script execute.
Batch + delay: max 50 updates per batch, 200ms delay between batches. On 600 SKUs that’s 4-5 minutes total, well within Shopify rate limits.
Backfill log: each modification is written to a bulk_ops_log table with entity_id, field, before, after, actor, timestamp. Full traceability.

Product merge is a special case. When a catalog grows through multiple imports (photos, listings, different suppliers), duplicates appear. The merge script identifies candidates (match on title + vendor + SKU pattern), queues them, and applies the merge after confirmation. The “winner” product inherits variants, images and metafields; the “loser” gets archived with a 301 redirect from the old URL handle. No broken links, no loss of SEO equity.

Markets and publications: the per-channel sync

For those selling across multiple markets (Shopify Markets) or channels (storefront + Shop App + B2B), the “where is this product published?” problem becomes serious. Our Markets/Publications pull is an edge function that, for each product, hits publication APIs and writes a product × market × status matrix to the database. From there the internal UI shows: this product is live on IT, draft on UK, hidden on B2B.

It’s the kind of view Shopify Admin doesn’t offer aggregated natively, but that planners and marketing need daily to decide where to push.

Replicable pattern

If you manage a Shopify catalog > 200 SKU with sync to an ERP or another system:

Map source of truth per entity before writing any code. Disagreement here = guaranteed drift.
Combine real-time webhooks and cursor-based polling. One alone never suffices.
HMAC + inbox table + separate worker for webhook reliability.
Monthly reconciliation with thresholds: auto-fix below, manual review above.
Bulk ops always with dry-run + log. Without it, sooner or later something breaks.
Markets/Publications in the database, not from live queries.

Initial investment to set up this schema is 2-4 weeks on a medium catalog. The benefit isn’t “saving X hours per week”: it’s eliminating the class of errors where you notice the drift only after the customer complains. For a 600+ SKU ecommerce at real volume, it’s the difference between growing without fear and having to freeze the catalog every time you open a new channel.