15 June 2026

Silence Is a Bug: Why the Backend Owes Its Users a Story

How narrating pod lifecycle events turned a silent Kubernetes log stream into an honest, trustworthy deployment experience.

A sixty-second blank screen taught us that the most important feature of a live system isn’t speed — it’s narration. The products that win the streaming era will be the ones that learn to talk during the quiet moments.


The bug, as the user saw it

You hit Deploy. The page swaps to a log view. Lines start appearing. You scroll, you feel good.

Then you push a fix and hit Deploy again.

For the next sixty seconds, the log view shows nothing.

Did the deploy work? Did the new version even start? Is the page broken? Should you refresh? You refresh. Now you’ve created two sessions and the second one also shows nothing for sixty seconds. You file a bug.

That sixty-second silence is the entire post. Everything else — the retries, the lifecycle events, the discovery loops — exists to make sure that silence is never possible.


What’s actually happening underneath

A deployment in our system is backed by a pod (think: a running container) inside a Kubernetes cluster. When you redeploy, the old pod is shut down and a new one is created in its place. The new pod has to:

  1. Be scheduled onto a node.
  2. Pull the container image — often 30 to 60 seconds.
  3. Start the process.
  4. Begin producing output.

Only after step 4 does the log API have anything to stream. Steps 1–3 are silent by default. The pod isn’t broken; it’s just not ready to talk yet.

Naïve log streamers — including ours, originally — handle this badly:

  • The old pod terminates. The current log stream returns EOF. The streamer’s goroutine exits cleanly, satisfied.
  • The streamer’s discovery loop polls every 10 seconds, finds a new pod, and tries to attach.
  • The new pod is still pulling the image. The log API returns an error: container creating, not ready.
  • The streamer retries three times at one-second intervals. All three fail. It logs an error and gives up until the next discovery tick.
  • This repeats for the better part of a minute. Nobody sees anything.

The system is working perfectly. The user thinks it’s dead.


The reframe: silence is a state, and states deserve events

It’s tempting to attack this in the frontend. “Show a spinner when the stream goes quiet.” “Add a ‘still working…’ message after 5 seconds of silence.” Those are fine band-aids, but they don’t fix the root cause: the backend has rich information about what’s happening, and it’s throwing all of that information away before it gets to the client.

So we changed the contract. Instead of treating the log stream as a pipe that carries only the container’s stdout, we treated it as a narrative — and the narrative includes the meta-events that make the stdout make sense.

Concretely, the stream now emits synthetic entries for things it would previously have hidden:

  • “Pod foo-7d9c is ContainerCreating — pulling image.”
  • “Streaming logs from pod foo-7d9c.”
  • (container output begins.)
  • “Log stream ended for pod foo-7d9c.”
  • “Pod foo-8a2b is ContainerCreating — pulling image.”
  • “Streaming logs from pod foo-8a2b.”

The user now sees an unbroken story: old pod ended, new pod is coming up, here it is. The sixty seconds of “is this broken?” became sixty seconds of “this is what’s happening.” Same wall clock, completely different experience.

This is the single most valuable lesson from the work, and it generalizes to almost every system that streams to humans:

Silence between events is ambiguous. Disambiguate it at the source, not in the UI.

The frontend cannot tell the difference between “the backend hasn’t sent anything in a while because the work is in flight” and “the backend hasn’t sent anything in a while because it crashed.” Only the backend knows. So the backend should say.


Don’t retry against a brick wall

Once we started emitting lifecycle events, a second class of bug became visible: we were doing a lot of work for nothing.

The pod-attach retry loop would hammer the log API every second for three seconds, get the same “not ready” answer each time, and then back off. Across a 60-second image pull, that meant six discovery cycles, eighteen failed attach attempts, eighteen log entries, and a slightly elevated load on the cluster’s control plane — all to produce zero output.

The fix was a single check: before retrying, look at the pod’s status. If it’s Pending or ContainerCreating, the log API will refuse you no matter how many times you ask. Don’t ask. Wait for the pod to leave that state, then attach once.

if reason, waiting := podWaitingReason(pod); waiting {
    emit("info", fmt.Sprintf("Pod %s: %s", pod.Name, reason))
    return  // bail quietly; rediscovery will pick it up when it's ready
}

We also cut retries from 3 to 2 and dropped the interval from 1s to 500ms. The reasoning was straightforward: the system already has a discovery loop that runs every 10 seconds. If our first attach attempt fails for a transient reason, we don’t need to retry six times in two seconds — the next discovery tick will catch it. Tight retries are usually a sign that your component doesn’t trust the rest of the system to do its job.

The result: fewer API calls, fewer bogus error log lines, and — counterintuitively — faster user-visible recovery, because we stopped wasting time on doomed attempts.


What “good” looks like as a transport contract

After we were done, the log streamer had three jobs, in priority order:

  1. Be honest. Every transition the backend knows about gets surfaced as a log entry. The user never has to guess what state the system is in.
  2. Be quiet about its own machinery. Failed internal retries, polling intervals, goroutine bookkeeping — none of that goes into the stream. The narrative is about the user’s deployment, not our scaffolding.
  3. Be cheap. Don’t pound external APIs that have already told you to wait. Trust the higher-level loops to drive forward progress.

That’s the entire design. It fits on a sticky note. The implementation is more elaborate, but every line of it serves one of those three goals.


Patterns this generalizes to

You don’t need Kubernetes for any of this to be useful. If you’ve ever built a UI that streams progress from a backend, you’ve had a version of this problem. Some places it shows up:

  • CI build streams. The build output for npm install and docker build and pytest interleave with long silent stretches while a tool decides what to do. Surfacing “step 4/12: pulling base image” turns silence into progress.
  • Long-running AI completions. A model that pauses mid-response (tool call, retrieval, planning) creates the same ambiguity. Streaming a typed event — even one that says “thinking” — beats streaming nothing.
  • Deploy pipelines, data jobs, video transcodes. Anything where the backend traverses states the user doesn’t see. Every state transition is an event worth emitting; the cost is one log line and the payoff is the user trusting your product.
  • Resumable downloads, chunked uploads, replication progress. Same shape. Same fix.

A useful heuristic when designing any of these:

For every second of silence longer than your user’s patience, you owe them one synthetic event.

In our case the user’s patience was about three seconds and the silence was a minute. The math made itself.


The lesson under the lesson

The instinct, when a user reports “the page goes blank,” is to fix the page. We almost did that. We almost added a frontend spinner, a “no data yet, retrying…” banner, a refresh button. Each of those would have shipped, looked fine in screenshots, and quietly lied to users about a system that was actually working.

What we ended up with instead was a backend that explained itself. The frontend got simpler — it just renders whatever comes down the pipe — because the pipe got more honest.


A stance on what comes next

For most of the web’s history, the unit of UX was the page: a complete, finished artifact handed to the user in one motion. Loading was a transitional state to be hidden behind a spinner and apologized for. Silence wasn’t a design problem because there was nothing to design during silence — the user was either looking at the old page or the new one.

That world is ending. The interfaces taking its place — AI assistants thinking aloud, deployments going live, agents working on your behalf, data pipelines updating in real time — live in the transition. The interesting work happens in the seconds and minutes between the user’s input and the final result. Those seconds used to be a gap in the experience. They are now the experience.

The teams that win this shift will not be the ones with the fastest backends. Speed has diminishing returns the moment users understand that real work takes real time. The winners will be the ones whose systems are honest in motion — products that, at any instant, can tell the user exactly what they are doing and why, without anyone having to ask. Honest products feel fast even when they’re slow, because users trust that the time being spent is time being spent on their behalf. Opaque products feel broken even when they’re working, because every second of silence is a small abandonment.

This is a deeper shift than it looks. It changes what a backend is for. The old job of a backend was to compute an answer and return it. The new job is to compute an answer and narrate the computation as it happens — to surface enough internal state, in human-meaningful units, that the frontend never has to guess and the user never has to wonder. Anything less is a product that lies about its own behavior.

We didn’t learn this from a book. We learned it from a sixty-second blank screen and the user who decided, reasonably, that our system was broken. He was right. The fix wasn’t faster code or a better spinner. The fix was treating silence as a thing we owed his words about — and building a system whose default behavior was to keep talking.

If your product streams anything to humans, the question isn’t whether you can afford to do this. It’s whether you can afford not to. The competitors who figure it out first will set the expectation; everyone else will spend the next product cycle explaining why their thing “is actually working, you just have to wait.” That’s not a winning pitch.

Push the truth as close to the source as you can. Make your systems narrate themselves. The era of silent backends is over.

Comments