API Ingestion at Scale: Production Reality

The difference between a prototype and production API ingestion is handling scale, rate limits, and downstream SLAs across dozens or hundreds of sources.

Rate Limits Are Your Hard Constraint:

Every API has limits. Salesforce might allow 200 requests per minute per tenant. Shopify enforces bucket based rate limiting with burst allowances. If you scale horizontally by adding more workers without coordination, you just hit limits faster and increase error rates.

Rate Limit Impact on Backfill
200/min
API LIMIT
1M records
TO BACKFILL
~3.5 hours
MIN TIME

At 200 requests per minute with 1000 records per page, you need 1000 requests to fetch a million records. That is 5 minutes minimum, assuming perfect efficiency. In practice, add retry overhead and you are looking at hours for large backfills.

The Multi Tenant Challenge:

Fivetran and Airbyte serve thousands of customers, each with their own API tokens and rate limits. A naive approach where each customer gets dedicated workers fails at scale. Instead, they use a shared scheduler that tracks per connector rate limit state globally. When customer A is nearing their Shopify limit, the scheduler throttles their syncs and allocates capacity to customer B.

This centralized coordination is why scaling to 10x more customers does not mean 10x more infrastructure. It means smarter scheduling and backoff algorithms.

Two Phase Ingestion Pattern:

Consider a real flow from a commerce platform. A product catalog lives in a headless CMS with an API similar to Bloomreach. The data team needs this in both a search index (for customer facing search) and a warehouse (for analytics).

Phase one: An Airflow job submits product updates through the ingestion API. The API returns a job identifier. Airflow polls the status endpoint every 10 seconds. Job completes in 30 to 300 seconds for tens of thousands of products.

Phase two: Once ingestion succeeds, a separate API call triggers index rebuilding. But indexing can only run once per hour to protect search cluster SLAs and avoid overwhelming background workers. This decoupling means multiple teams can ingest concurrently without impacting customer facing search performance.

⚠️ Common Pitfall: Companies often underestimate API latency variance. Your p50 might be 2 minutes but p95 could be 15 minutes due to rate limit backoff. Always design for p95 or p99, not median.
Observability Essentials:

Track lag between source updates and destination visibility. Measure 95 percent of updates applied within 5 minutes for near real time pipelines. Log HTTP status code distributions per endpoint. Monitor payload size distributions to catch schema inflation. Structured logging with request identifiers and job identifiers is critical for debugging partial failures in production.

💡 Key Takeaways

✓Rate limits are hard constraints: at 200 requests per minute with 1000 records per page, backfilling 1 million records takes minimum 3.5 hours

✓Multi tenant systems use centralized schedulers that track per connector rate limit state globally, allowing 10x customer growth without 10x infrastructure

✓Two phase patterns decouple ingestion from downstream processing: product catalog ingestion completes in 30 to 300 seconds, but index updates limited to once per hour

✓Design for p95 or p99 latency, not median: p50 might be 2 minutes but p95 can be 15 minutes due to rate limit backoff and retries

📌 Interview Tips

1Fivetran maintains per connector rate limit backoff state and uses a shared scheduler to keep within global QPS budgets across thousands of customers

2Bloomreach style catalog ingestion: submit batch, get job ID, poll every 10 seconds, job completes in 30 to 300 seconds for tens of thousands of products

3Production observability: track 95 percent of updates applied within 5 minutes, log HTTP status distributions, monitor payload size growth

← Back to API-based Data Ingestion Patterns Overview