56/60 Days System Design Questions

DEV Community

Top comments (4)

thejoud1997 profile image

Principal Solution Architect and Backend Engineer with 11+ years of experience designing, building, and scaling distributed systems serving millions of daily users.

Work

Principal Solution Architecture
Joined

Dec 21, 2023

• Jul 1

A) Polling — Correct for most cases
Client-controlled, stateless, scales independently. The server doesn't care if the client disconnects, retries, or crashes — the job runs and the status endpoint just answers queries. 5s polling intervals on a job that takes 2–8 minutes is trivially cheap. The key rules: make job IDs stable and idempotent, set a TTL on job records so you don't accumulate state forever, and use exponential backoff not fixed intervals. LinkedIn, YouTube, and S3 multipart uploads all use polling for async job status.

thejoud1997 profile image

Joud Awad

Principal Solution Architect and Backend Engineer with 11+ years of experience designing, building, and scaling distributed systems serving millions of daily users.

Work

Principal Solution Architecture
Joined

Dec 21, 2023

• Jul 1

B) Webhook — Right idea, wrong default
Webhooks are great for server-to-server flows where the receiver has a stable HTTPS endpoint. They fall apart for mobile clients (no public URL), in environments with NAT/firewall, and when the receiver is down at delivery time. You'd need a retry queue, delivery guarantees, and signature verification just to make it reliable. Webhooks work well as a supplementary delivery mechanism for platform integrations — not as the primary client notification path for end users.

thejoud1997 profile image

Joud Awad

Principal Solution Architect and Backend Engineer with 11+ years of experience designing, building, and scaling distributed systems serving millions of daily users.

Work

Principal Solution Architecture
Joined

Dec 21, 2023

• Jul 1

C) SSE / WebSocket — Expensive for this use case
Real-time push is fantastic for chat, live dashboards, and collaborative editing. For a job that takes 2–8 minutes and is triggered once? You're holding an open connection, burning a file descriptor, and adding connection-management complexity for maybe 3 meaningful state changes. SSE makes sense when you genuinely need sub-second updates or continuous streaming. For async job progress, polling is cheaper and simpler.

thejoud1997 profile image

Joud Awad

Principal Solution Architect and Backend Engineer with 11+ years of experience designing, building, and scaling distributed systems serving millions of daily users.

Work

Principal Solution Architecture
Joined

Dec 21, 2023

• Jul 1

D) Synchronous wait — Production antipattern
Keeping the HTTP connection open for 2–8 minutes kills your load balancer (most timeout at 30–60s), ties up a server thread or process, and gives the client no recovery path if the connection drops mid-job. This is how systems get "stuck" — jobs complete but the client never hears about it because the connection dropped at minute 3. Never block on long-running work. Always return immediately with a job ID.