Chunked File Upload with Resume: Browser to Server
When you need to upload a 50 GB VM disk image through a web browser, a single HTTP POST is not going to work. Network interruptions, browser tab crashes, and corporate proxy timeouts all conspire to make large uploads fail. HyperSDK solves this with a chunked upload protocol that splits files into 10 MB pieces and supports resume from the last successful chunk.
The Problem with Large Uploads
Standard HTML file uploads use a single multipart/form-data request. This works for megabyte-sized files but breaks down at scale. Most reverse proxies have request body limits (often 10-100 MB). Browser memory usage spikes when loading a multi-gigabyte file into an ArrayBuffer. Network drops at 80% completion mean starting over from zero. And there is no way to show granular progress -- the browser either knows the request is in-flight or it does not.
We needed an approach that works reliably for files up to 50 GB, provides real-time progress feedback, and recovers gracefully from network failures.
Protocol Design
Our chunked upload protocol uses four API endpoints:
-
POST /upload/init-- Initialize a session. The client sends the filename, total size, and preferred chunk size. The server allocates an upload ID and returns the total chunk count. -
POST /upload/{id}/chunk/{n}-- Upload a single chunk. The body is raw bytes (application/octet-stream). The server writes the chunk to a temporary file and records it as received. -
GET /upload/{id}/status-- Query progress. Returns the count of received chunks, bytes received, and overall percentage. This is the key endpoint for resume -- the client checks which chunks are missing and picks up where it left off. -
POST /upload/{id}/complete-- Finalize the upload. The server reassembles all chunks into the final file, computes a SHA-256 checksum, and returns the file path.
Each chunk upload is idempotent. If the same chunk is uploaded twice (e.g., after a retry where the server received it but the client did not get the acknowledgment), the server simply overwrites the existing chunk data. This eliminates an entire class of duplication bugs.
Client-Side Implementation
On the browser side, we use the File API's Blob.slice() method to split the file into chunks without loading the entire file into memory. For a 50 GB file with 10 MB chunks, we create 5,000 slice references without allocating any additional memory -- the browser reads each slice from disk on demand.
const CHUNK_SIZE = 10 * 1024 * 1024; // 10 MB
const totalChunks = Math.ceil(file.size / CHUNK_SIZE);
for (let i = 0; i < totalChunks; i++) {
const start = i * CHUNK_SIZE;
const end = Math.min(start + CHUNK_SIZE, file.size);
const chunk = file.slice(start, end);
await uploadChunk(uploadId, i, chunk);
onProgress((i + 1) / totalChunks * 100);
}
Each chunk is uploaded using XMLHttpRequest rather than fetch(). The reason is XHR's upload.onprogress event, which fires during the request body transmission and provides bytes-sent granularity. With fetch(), you only know when the response arrives -- for a 10 MB chunk on a slow connection, that could be 30 seconds of silence. XHR gives us a smooth progress bar even within a single chunk.
Resume on Network Failure
When the network drops mid-upload, the React component catches the XHR error and enters a retry state. Before retrying the failed chunk, it queries the status endpoint to confirm the server's view of progress. This handles the case where the chunk was actually received but the response was lost.
The status response includes the count of received chunks. The client computes the set of missing chunks (which may not be contiguous if earlier retries partially succeeded) and uploads only those. In practice, since we upload sequentially, the missing chunks are always a contiguous range from the last received chunk to the end.
The dashboard UI shows the retry state clearly: a yellow progress bar with a "Resuming..." label and the count of remaining chunks. The user can also manually trigger a resume if they closed and reopened the tab -- the upload ID is persisted in localStorage, and the component checks for incomplete uploads on mount.
Server-Side Handling
On the server side, each upload session creates a temporary directory with one file per chunk, named by index (e.g., chunk_0000, chunk_0001). The session metadata (filename, total size, expected chunk count, received chunks) is stored in memory and periodically flushed to a JSON file in the temporary directory.
When the client calls the complete endpoint, the server opens the final output file and copies each chunk file in order using io.Copy. After reassembly, it computes the SHA-256 checksum of the final file and compares it to the expected size. If everything checks out, the temporary directory is removed and the upload is marked as ready.
We chose sequential chunk files over a single sparse file because it is simpler to reason about, works on any filesystem, and makes the status endpoint trivial to implement -- just count the files in the directory.
Progress Tracking with CallbackProgressReader
HyperSDK's pkg/ioutil package provides a CallbackProgressReader that wraps an io.Reader and invokes a callback function on every read. We use this throughout the codebase for tracking progress on both uploads and exports. On the upload path, it feeds the dashboard's progress bar. On the export path, it drives the job progress percentage that appears in the Jobs Table view.
The callback receives the number of bytes read so far and the total expected bytes. The dashboard component uses this to compute transfer rate (bytes per second), estimated time remaining, and a percentage for the progress bar. All of this updates in real time as chunks flow through the reader.
What We Would Do Differently
If we were starting over, we would add parallel chunk uploads. Currently, chunks are uploaded sequentially because it simplifies the server-side reassembly and avoids complications with upload ordering. But for high-bandwidth connections, uploading 3-4 chunks in parallel would significantly reduce total upload time for large files. The protocol already supports it -- chunks can arrive in any order since they are written to separate files -- but the client currently does not take advantage of this.
We would also add server-side resumability across daemon restarts. Currently, the in-memory session state is lost when the daemon restarts. The chunk files survive on disk, but the metadata needs to be reconstructed. Adding a simple JSON metadata file (which we partially do for crash recovery) and a scan-on-startup routine would make the upload truly durable.
Even without these improvements, the current implementation handles the common case well: upload large files through the browser with progress feedback and automatic recovery from transient network failures.