Observe

Stage one of The Method. Gather what is, without claim.

There is a move every engineer has made and every engineer regrets. You see a failure. Within seconds, a cause suggests itself. You change a line, you re-run, and you wait to see whether the red turns green. Sometimes it does, and you have learned nothing. Sometimes it does not, and you have already begun the slow accretion of changes that fix nothing and break something. Either way, you skipped the stage most likely to save time: you didn’t look first.

Observe is the refusal to claim before you have seen. It is the least glamorous stage of The Method and the one that pays the largest dividend, because the dominant cost in software is not the cost of typing. It is the cost of misunderstanding — building the wrong thing, or the right thing against a model of the system that was quietly false. Slow, intentional coding begins here, with the deliberate, almost meditative act of gathering what is, before you permit yourself an opinion about what ought to be.

Reading is the job

Start with an uncomfortable fact about how programmers actually spend their hours. We imagine the work is writing. It isn’t. In Clean Code, Robert C. Martin puts the ratio bluntly: “the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code.”¹ Writing is the visible tip of an activity that is overwhelmingly about comprehension.

The empirical studies are more modest than Martin’s aphorism, but they point in the same direction. Minelli, Mocci, and Lanza instrumented developers’ IDE sessions and concluded that “on average, developers spend 70% of their time performing program comprehension” — though, in fairness, their sample was small (eighteen developers, with more than half the sessions from a single person), and their model treats comprehension partly as the residual of everything that isn’t editing or navigating.² On the other side sits a genuinely large study: Xia and colleagues collected 3,148 working hours from 78 professional developers and found comprehension consuming roughly 58% of their time.³ Those numbers should not be laundered into a universal constant; they are field evidence for a plainer claim. In maintenance and extension work, programming is often less the production of new text than the reconstruction, in your own head, of what is already there.

Which means the question is never whether you will spend your time observing. You will. The only question is whether you do it on purpose, with discipline, or whether you do it badly and pretend you skipped it.

The program is a theory, and you have to rebuild it

Peter Naur saw the deepest version of this in 1985, and four decades have not improved on it. In “Programming as Theory Building,” Naur argued that a program is not, fundamentally, the text on disk. The program is a theory held in the minds of the people who built it — a working understanding of how the problem in the world maps onto the structures in the code, what may be changed safely and what may not, why things are the way they are.⁴ The source is a lossy projection of that theory. The documentation is lossier still.

The consequence is stark: when the team that holds the theory disperses, the program is, in Naur’s sense, dead — not because the code stops running, but because no one can extend it without reconstructing the theory from scratch, and the artifacts left behind are not sufficient to do that fully. Every engineer who has inherited a “working” system they were afraid to touch has felt the truth of this. The code ran. The theory was gone.

This is why the artifacts you trust matter, and in what order. Comments drift out of date and quietly begin to lie. Names made sense to someone who has since left. The documentation describes a system from two refactors ago. Running behavior, passing tests, commit history, and the few humans who still carry fragments of the theory are not infallible either: a test encodes an old assumption, a production trace samples one slice of reality, and a commit message may record what changed without explaining why. But these artifacts fail differently. Observation is the work of triangulating their disagreements until a faithful enough model reassembles in your own mind. It is slow precisely because it cannot be rushed: a theory rebuilt from the comments alone is a theory rebuilt from a rumor, and you will act on it with a confidence it has not earned.

Observe, then, is theory recovery. Before you change a system you must rebuild enough of its theory to know what your change will disturb. This is why “just read the diff” is not reading and why a green test suite is not understanding. You are not trying to confirm that the code does what it does; you are trying to reconstruct why, so that your eventual claim about how to change it rests on something real.

Measure, because your intuition is wrong

The same discipline governs performance, and here we have the most quoted — and most mutilated — sentence in our field. Donald Knuth, 1974: “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”⁵ The fragment that survives in conference talks ends at “evil,” and it has been pressed into service as a blanket excuse not to think about performance at all. Read the whole passage and Knuth is saying nearly the opposite. The very next sentence: “Yet we should not pass up our opportunities in that critical 3%.” A page earlier he insists that a 12% speedup, easily obtained, is never marginal.

Knuth’s actual argument is an Observe argument. The reason not to optimize prematurely is that you do not yet know where the 3% is — and a guess about the hot path is not evidence. Measurement often surprises the person holding the hunch. So the rule is not “don’t optimize.” The rule is: do not optimize before you have observed. Profile the running system. Find the real hot path. Then spend your cleverness where the measurement, not the hunch, tells you it will matter.

The surprises are routine once you start measuring. The function everyone “knew” was the bottleneck accounts for two percent of wall-clock time; the real cost is a serializer called inside a loop nobody thought about, or a chatty ORM issuing a thousand small queries behind one innocent-looking attribute access, or lock contention that is invisible on a developer laptop and ruinous under production concurrency. None of these announce themselves in the source. They appear only in the profile, under load, in the place the work actually happens — which is the whole reason the measurement has to come before the cleverness, not after it. Optimizing from a read-through of the code is just conjecture with extra confidence.

This generalizes past performance into a stance. Intuition about a complex system is a hypothesis, not a finding. Observation is how you tell the two apart.

Debugging is applied science

Nowhere is the temptation to skip observation stronger than in debugging, and nowhere is the cost higher. Andreas Zeller’s Why Programs Fail reframes debugging as exactly what it should be: an application of the scientific method.⁶ You observe the failure. You make it reproducible — not because reproduction magically fixes the bug, but because Zeller’s debugging process starts by making the problem explicit enough to test.⁷ A defect you cannot reproduce is a defect you do not yet understand. You form a hypothesis about the cause. You devise an experiment that would distinguish your hypothesis from its rivals. You run it. You refine. You converge on the defect by evidence rather than by guesswork and prayer.

The practitioner’s version is shorter: reproduce it before you touch it. The bug that you “fixed” without a reliable reproduction is a bug you have merely hidden, and it will return at a worse time. The reproduction is the observation. Build it first, and the fix often becomes obvious; skip it, and you are editing in the dark.

Watch what actually happens when this discipline is skipped. A failure appears intermittently in production. An engineer reads the stack trace, recognizes a familiar-looking null, adds a guard, and ships — no reproduction, just a plausible story. The crashes stop, briefly, because the guard suppresses the symptom at the surface while the real cause, an unsynchronized write three layers down, keeps corrupting state in ways that now surface somewhere else entirely. The bug didn’t get fixed; it got relocated and disguised, and the next engineer inherits a harder problem with a misleading clue stapled to it. The reproduction would have prevented all of it, because to reproduce a fault reliably you must understand the conditions that trigger it — and once you understand those, you are no longer guessing. Observation was never the slow path here. The guess was the slow path; it merely hid its cost in someone else’s quarter.

Observing systems that are already running

For the systems we actually operate — distributed, concurrent, alive in production — observation becomes an engineering property with a name. Charity Majors, Liz Fong-Jones, and George Miranda define observability, borrowing from control theory, as how well you can understand a system’s internal states from its external outputs: concretely, the power “to ask new questions of your system, without having to ship new code or gather new data in order to ask those new questions.”⁸

That last clause is the whole discipline compressed. A system you can only understand by deploying a new log line is a system you cannot observe; you are guessing, then shipping the guess, then guessing again. Real observability means the question you didn’t anticipate at deploy time — why is latency spiking only for users in this region on this build? — is answerable from data you already emit, sliced by dimensions you already capture. Building that capacity in advance is what lets you, later, gather what is. It is Observe, designed into the system rather than bolted on during the outage.

Go and see

The oldest version of this discipline doesn’t come from software at all. At the heart of the Toyota Production System is genchi genbutsu — “go and see for yourself.” Don’t manage the factory from the conference room and the summary report; go to the floor, to the actual place and the actual thing, and observe the work where it happens.⁹ Jeffrey Liker codifies it as Principle 12 of the Toyota Way: “Go and see for yourself to thoroughly understand the situation.”¹⁰

For us the shop floor is the running system, the actual user session, the real query plan, the production log — not the architecture diagram, not the ticket’s description of the bug, not the mental model you formed six months ago and never updated. The diagram is someone’s old conjecture. The summary is lossy. Go to where the work actually happens and look at the thing itself. (The famous story of Taiichi Ohno chalking a circle on the floor and making an engineer stand in it until he truly saw the problem is probably apocryphal — but it endures as a teaching legend precisely because it names something true.)

The honest limit: observation is not free, and it is not neutral

Three caveats, because a method that can’t state its own failure modes isn’t a method.

First, observation has a stopping problem. Taken to excess it becomes analysis paralysis — the endless read-through, the refusal to commit, research as a sophisticated form of procrastination. The discipline is bounded: you observe to the point of a falsifiable hypothesis, not forever. The purpose of gathering what is, is to earn the right to propose what might be true. Observation without a question forming on the other side of it is not rigor; it is stalling. When you notice a conjecture taking shape, you have observed enough for now. Move.

Second, observation has an opportunity cost. There are incidents where the responsible act is not a beautiful reconstruction of theory but a small reversible mitigation while the evidence is still incomplete. Slow coding fails when it treats delay as virtue in itself. The defensible rule is narrower: spend the extra observation time when the change is hard to reverse, the blast radius is high, or the system is poorly understood; prefer the quick reversible probe when it buys information safely.

Third, the stage’s own motto — gather what is, without claim — is an aspiration you cannot fully reach, and it is more useful once you admit that. There is no theory-free observation.¹¹ What you notice is shaped by what you expect; the bug you’re hunting determines which logs you read. Naur would say you are already building theory the moment you look. The discipline is not to achieve some impossible blank-slate neutrality. It is to notice your claims as claims — to hold the hunch loosely, to write down “I believe the slowdown is in the database” as a hypothesis to be tested rather than a fact to be acted on, and to stay genuinely willing for the system to tell you that you were wrong.

What Observe looks like in practice

The stage resists checklists, because its essence is attention rather than procedure. But a few habits reliably separate engineers who observe from engineers who only think they do:

Reproduce before you repair. No fix without a reliable reproduction. The reproduction is the observation, and building it usually teaches you the cause.
Read the surrounding code and its history before you edit it. git log and git blame are observation instruments. The line you’re about to “simplify” is often load-bearing for a reason recorded in a commit message three years ago.
Profile before you optimize. Always. Your instinct about the hot path is a hypothesis, not a finding, and Knuth’s point is that cleverness belongs after the critical code has been identified.
Write down what you observed as claims, not conclusions. “Latency correlates with cache misses on this endpoint” is a finding you can test. “The cache is broken” is a leap you haven’t earned.
Instrument for questions you can’t yet predict. Observability is Observe built into the system in advance — wide, high-cardinality events that let you ask tomorrow’s question without shipping tomorrow’s code.
Go to the actual thing. The production trace over the diagram. The real user’s session over the summarized report. The territory, not the map.

The handoff

Observe ends not with an answer but with a question sharp enough to be worth asking. You have rebuilt enough of the system’s theory to know what you’re touching. You have measured rather than guessed. You have made the failure reproducible and named your hunches as hunches. What you hold now is not yet a plan and certainly not a change — it is the raw material from which a good conjecture can be made.

That restraint is the point. In a culture that rewards the fast fix and the confident assertion, the engineer who says “I don’t know yet, I’m still looking” is doing the most valuable work in the room. Software’s expensive mistakes are made early, in the gap between what we assumed and what was true, and they are paid for late. Observe is how you close that gap before it costs you. Gather what is. Withhold the claim. The claim comes next — and because you looked first, it will be worth proposing.

Robert C. Martin, Clean Code: A Handbook of Agile Software Craftsmanship (Prentice Hall, 2008), ch. 1. The “well over 10 to 1” reading-to-writing ratio is Martin’s; the shorter aphorism “code is read more than it is written” is a common paraphrase of it. ↩
Roberto Minelli, Andrea Mocci, and Michele Lanza, “I Know What You Did Last Summer — An Investigation of How Developers Spend Their Time,” Proc. IEEE 23rd International Conference on Program Comprehension (ICPC 2015), 25–35. DOI: 10.1109/ICPC.2015.12. The ~70% figure is from a small, somewhat skewed sample and should be cited as their measured result rather than a settled universal. ↩
Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E. Hassan, and Shanping Li, “Measuring Program Comprehension: A Large-Scale Field Study with Professionals,” IEEE Transactions on Software Engineering 44, no. 10 (2018): 951–976. DOI: 10.1109/TSE.2017.2734091. 78 developers, 3,148 working hours, ~58% of time on comprehension. ↩
Peter Naur, “Programming as Theory Building,” Microprocessing and Microprogramming 15, no. 5 (1985): 253–261. DOI: 10.1016/0165-6074(85)90032-8. Reprinted in Naur, Computing: A Human Activity (1992). ↩
Donald E. Knuth, “Structured Programming with go to Statements,” ACM Computing Surveys 6, no. 4 (December 1974): 261–301, at p. 268. DOI: 10.1145/356635.356640. The full sentence and its surrounding context invert the meaning of the truncated “premature optimization” fragment. ↩
Andreas Zeller, Why Programs Fail: A Guide to Systematic Debugging (Morgan Kaufmann, 2005; 2nd ed. 2009). The book develops debugging as a hypothesis-driven application of the scientific method. ↩
Andreas Zeller, “Scientific Debugging,” teaching material for Why Programs Fail. URL: whyprogramsfail.com/pdf/ScientificMethod.pdf. The material explicitly frames debugging as hypothesis, prediction, experiment, observation, and conclusion. ↩
Charity Majors, Liz Fong-Jones, and George Miranda, Observability Engineering: Achieving Production Excellence (O’Reilly, 2022). The quoted definition draws on Majors’s “Observability: A Manifesto,” Honeycomb, 2018. URL: honeycomb.io/blog/observability-a-manifesto. ↩
Taiichi Ohno, Toyota Production System: Beyond Large-Scale Production (Productivity Press, English translation 1988; original Japanese 1978). The foundational text for genchi genbutsu / “go and see.” ↩
Jeffrey K. Liker, The Toyota Way: 14 Management Principles from the World’s Greatest Manufacturer (McGraw-Hill, 2004), Principle 12: “Go and see for yourself to thoroughly understand the situation (genchi genbutsu).” ↩
Norwood Russell Hanson, Patterns of Discovery: An Inquiry into the Conceptual Foundations of Science (Cambridge University Press, 1958). Hanson is the standard source for the theory-ladenness of observation. URL: gwern.net/doc/philosophy/epistemology/1958-hanson-patternsofdiscovery.pdf. ↩