Chunk and Part Sizing Trade-offs
Part Size Trade offs
Smaller parts mean less work wasted on retry but more HTTP overhead. Each part requires a separate request with headers, authentication, and connection setup. For 1MB parts, a 10GB file needs 10,000 requests. Each request adds 10-50ms of overhead. At 100MB parts, only 100 requests, far less overhead.
Larger parts risk more wasted work on failure. A 500MB part at 100 Mbps takes 40 seconds. Network failure at 39 seconds wastes 39 seconds. The optimal part size depends on network reliability: stable datacenter links tolerate larger parts, flaky mobile connections need smaller parts.
Platform Limits Shape Design
Cloud storage services impose limits. Common constraints: minimum part size 5MB (except last part), maximum part size 5GB, maximum 10,000 parts per upload. These limits constrain file sizes: 5GB * 10,000 = 50TB maximum with largest parts, 5MB * 10,000 = 50GB maximum with smallest parts.
For files approaching limits, compute part size dynamically. A 100GB file needs at least 100GB / 10,000 = 10MB parts. A 1TB file needs at least 100MB parts. Clients should calculate minimum part size from file size and part limit.
Adaptive Part Sizing
Smart clients adjust part size based on observed conditions. Start with moderate parts (50-100MB). If failures are frequent, shrink part size to reduce retry cost. If uploads succeed reliably, grow part size to reduce overhead. This adapts to network conditions without manual configuration.
Track success rate per part size. If 100MB parts fail 20% of the time but 25MB parts fail 2%, the smaller parts waste less total bandwidth despite higher overhead. The math: 20% * 100MB = 20MB wasted per attempt versus 2% * 25MB = 0.5MB.
Memory and Buffering Constraints
The client must buffer at least one part in memory (or on disk) before uploading. A 500MB part size on a device with 256MB available memory fails. Mobile apps typically use 5-20MB parts. Server side batch processing can use 100MB+ parts. Match part size to client capabilities, not just network conditions.