Four Core API Ingestion Patterns

API ingestion is not one size fits all. The pattern you choose depends on who initiates data transfer, how much control you have over timing, and what latency you need. Four patterns dominate production systems.

Pull Based (Polling)Your system polls API on schedule
↕
Push Based (Webhooks)Source calls your endpoint
↕
Async Job BasedSubmit batch, poll for status
↕
Streaming EventEach event sent immediately
Pull Based Polling:
Your pipeline wakes up every 15 minutes or hourly and fetches data. You use updated_at timestamps or cursors to get only changed records since last sync. Fivetran uses this pattern for most SaaS connectors. Typical incremental syncs achieve p95 latency under 5 to 10 minutes, while full backfills might take hours due to pagination and rate limits.

Push Based Webhooks:
The source system calls your HTTP endpoint when data changes. You validate the signature, enqueue the payload into a message queue or log, and return 200 OK immediately. This can deliver sub second freshness but requires you to maintain a highly available endpoint with proper authentication. You still need occasional full syncs to catch missed events.

Async Job Based:
You submit a batch of updates through one API endpoint and receive a job identifier. You then poll a separate status endpoint every 10 seconds until the job completes. Bloomreach uses this pattern for product catalog ingestion, with typical job latencies of 30 to 300 seconds for tens of thousands of products. This decouples submission from processing, allowing the backend to throttle work.

Streaming Event Collection:
Each user action or system event is sent individually through an API to a collector service. Segment uses this for behavioral tracking, accepting tens of thousands of events per second. The collector writes to a durable log immediately, then fans out to warehouses and destinations asynchronously.

✓ In Practice: Most companies use multiple patterns. Operational data from internal services might use streaming events. SaaS integrations use pull based polling. Critical low latency updates use webhooks.

💡 Key Takeaways

✓Pull based polling is simplest and works when you control the schedule, achieving p95 latency of 5 to 10 minutes for incremental syncs

✓Push based webhooks deliver sub second freshness but require maintaining a highly available endpoint and still need periodic reconciliation syncs

✓Async job based ingestion decouples submission from processing, with job latencies of 30 to 300 seconds, useful when the backend needs to throttle heavy work

✓Streaming event collection sends individual events immediately, accepting tens of thousands per second by writing to durable logs before downstream processing

📌 Interview Tips

1Fivetran polls Salesforce API every 15 minutes using <code>updated_at</code> cursors, respecting 200 requests per minute rate limits

2Bloomreach product catalog ingestion: submit batch via API, receive job ID, poll status every 10 seconds, typical completion in 30 to 300 seconds

3Segment HTTP tracking API accepts behavioral events at tens of thousands per second, immediately writing to Kafka before fanning out to destinations

← Back to API-based Data Ingestion Patterns Overview