Systems Notes #1 — Code That Works vs Code That Survives
"Most code works. Very little survives."
Most code works. Very little survives.
Working code passes tests, handles expected inputs, and ships on time. It assumes stable networks, predictable traffic, and cooperative dependencies.
Surviving code assumes the opposite.
It's idempotent. Retries are treated as load multipliers. Partial failures are expected, not exceptional. State boundaries are explicit. Concurrency is designed for, not discovered in production.
The shift isn't about writing more code. It's about designing for entropy.
When systems process millions of events or serve high-throughput workflows, "works on my machine" becomes irrelevant.
The real questions change:
- What happens if this runs twice?
- What breaks under 10,000 concurrent requests?
- How does it degrade?
- Where's the blast radius?
Reliability isn't patched later. It's chosen at design time.
Engineering maturity isn't measured by what your code can do, but by what it can withstand.