The Deep Structure of Sophisticated Software
There is a simple test for whether a software system is doing something hard. Look at the questions it has to answer. If most of them have the shape "what is the value of X?", the system is doing lookups. If most of them have the shape "what would happen if X were different?" or "what made Y happen?", the system is doing reasoning.
Lookups can be served by tables, indices, embeddings, hash maps, sorted files. The data structure barely matters.
Reasoning is different. Reasoning requires that you carry around a model of the relations between things—and that you can traverse those relations in the direction of influence.
This piece is about why sophisticated software systems—across compilers, observability, biology, search, drug discovery, and the new generation of agentic platforms—keep converging on graph representations; why those graphs increasingly need to carry typed semantics (knowledge graphs); and why the most demanding systems eventually need their graphs to encode something stronger: causality.
Graphs
A graph is the data structure where the relation between two things is itself a thing.
In a relational database, the relation between two rows is implicit. It exists as a foreign key, a join condition, a piece of SQL someone has to write. The relation is real, but it is not addressable. You cannot point at it. You cannot annotate it. You cannot ask, "show me all relations of this kind, regardless of what they connect."
In a key-value store, relations don't exist at all except as conventions inside values.
In a vector embedding, relations are collapsed into geometric similarity. You can ask "what is near X?", but not "how is X related to Y?"—because the answer to the second question has been compressed away.
In a graph, by contrast, the edge is a first-class object. It has a type. It has properties. It has a direction. It can itself be the subject of a query. This single change, promoting the relation to first-class status, changes what kinds of questions you can answer in tractable time.
Consider three queries:
- Find all customers in California.
- Find all customers who placed an order in the last 30 days.
- Find all chains of length ≤ 5 from a customer in California to a supplier in Vietnam, where each step is either a
placedorshipped_viaedge.
Query 1 is a filter. Query 2 is a join. Query 3 is a graph traversal, and every other representation will pay an exponential price to express it.
The deep reason graphs keep showing up in hard systems is that hard problems tend to be about long chains of dependency. Compilation chains. Causal chains. Blame chains. Information flow chains. When the question is given this final state, what sequence of relations led here?—you are asking a graph question, even if you don't know it yet.
Knowledge Graphs
A bare graph is just topology. It tells you that something connects A to B; it doesn't tell you what that something is or what it means. For most non-trivial systems, that is not enough.
A knowledge graph adds two things: a schema for the nodes (this thing is a Person, that one is a Company) and a schema for the edges (this connection is employed_by, that one is acquired_by). The relation is no longer just a pointer; it carries semantics.
This sounds like accounting overhead. In practice, it is what makes a graph composable across heterogeneous sources of information.
The reason an industrial-scale knowledge graph can integrate Wikipedia, Wikidata, structured markup from millions of sites, and proprietary data is that all of these sources can be coerced into the same typed vocabulary. actor.starred_in.movie means the same thing whether it came from a database with strict schemas or a parsed infobox in free text. Without that schema, you have a tangle of strings; with it, you have a substrate that can be queried, reconciled, and reasoned over.
But a typed graph, by itself, is still descriptive. It tells you what is. This drug was prescribed for this condition. This person reports to that person. This service calls that service. These are observations. They support questions of the form what is connected to what?
That is the first rung of the ladder. Most knowledge graphs in production never climb higher.
From knowledge graphs to causal graphs
Judea Pearl's framing here is the cleanest one I know. There are three rungs of inference:
- Association. What is the probability of Y given that I observe X? This is what correlations, joint distributions, embeddings, and most knowledge graphs do.
- Intervention. What is the probability of Y if I make X happen? This is fundamentally different. Observing rain and forecasting wet ground is association. Spraying a hose and predicting wet ground is intervention.
- Counterfactual. Given that Y did happen and X was the case, what would Y have been if X had been different? This is the strongest form: reasoning about a world that did not occur.
The data structures and queries that suffice for rung 1 are insufficient for rungs 2 and 3. To climb the ladder, edges must encode something stronger than co-occurrence. They must encode mechanism: the directional, conditional rules by which one variable produces another. That is what makes a graph causal.
A correlation graph will tell you that incidents in service A are followed by incidents in service B 80% of the time. A causal graph will tell you whether shutting down service B would prevent the next A-incident—or whether you'd be intervening on a symptom while the real driver sits upstream.
The first kind of graph helps you predict. The second kind helps you act.
Causality
Once a graph carries causal semantics, four classes of computation become possible that are simply unavailable otherwise.
Counterfactual queries. If we had not shipped that change, would the latency spike still have occurred? You cannot answer this from an observational record alone. You answer it by traversing the causal graph backward from the effect, identifying which upstream variable was perturbed, and recomputing the downstream consequences holding the perturbation fixed at its baseline value. This is, in essence, what a debugger is doing when you pin a variable and re-run a function.
Intervention prediction. If we add a circuit breaker between these two services, what is the new failure profile? Intervention is not the same as conditioning. Conditioning asks, "given these histories, what's typical?" Intervention asks, "given this graph and this surgical change to a node, what is the new equilibrium?" You need the graph's structure, not just its statistics. A correlation table tells you who tends to fail together; a causal graph tells you whether breaking that correlation will help.
Root cause analysis. Why did this happen? Root cause is, in its essence, a traversal of the causal graph backward from the observed failure to the earliest node that, when pinned to its baseline, makes the failure go away. Without the graph, you can list correlated symptoms; you cannot terminate the search at the cause. You can produce ranked candidates; you cannot produce explanations.
Blame and credit assignment. Which input contributed most to this output? Whether you are debugging a model, attributing revenue to channels, or apportioning responsibility for an outage, the question only makes sense relative to a causal graph.
These are the daily work of compilers, debuggers, observability platforms, drug discovery pipelines, recommendation systems, and increasingly, AI agents. The systems that do them well alway have an explicit causal graph at their core.
The current frontier—retrieval-augmented generation, agentic systems, neural search—is in the same place. The systems that work are not the ones with the largest vector indices. They are the ones whose retrieval substrate has structure: typed relations, citations with semantic roles, dependency between facts. Vectors find similar things. Graphs find related things. Causal graphs find consequential things.
Every sufficiently advanced engineering intelligence platform—every system that aspires to understand code rather than search it—ends up building a causal graph. Static analysis produces points-to and call-graph edges. Behavioral analysis produces edges from inputs to observed outputs. Diff analysis produces edges from changes to consequences. Together, these become the substrate on which questions like if I deprecate this function, what tests will fail? or which downstream service is affected by this database schema change? become tractable.
The same is true of agent systems. An LLM agent that can plan multi-step tasks must, somewhere, be reasoning over a graph: tool dependencies, intermediate state, expected effects. The agents that fail in subtle ways are usually the ones whose internal causal model is implicit and inconsistent. The ones that succeed have made the model explicit.
Where most knowledge graphs stop short
A great deal of what is called a knowledge graph in industry is not a knowledge graph in any deep sense. It is a typed property graph used as a search index. Its edges encode is_a, has_a, part_of, mentioned_in. These are useful, but they are descriptive—they tell you what is, not what produces what.
Three things tend to be missing.
Conditionality. Real causal relationships hold under conditions. Drug X reduces mortality, in patients with markers Y, when administered before stage Z. Strip the conditions and you have a sentence that is sometimes true and frequently dangerous. Most production knowledge graphs have nowhere to put the conditions; the edge is unconditional or carries at best a confidence score, which is not the same thing. A confidence score tells you how often the edge holds; a condition tells you when it holds. Causal reasoning needs the second.
Directionality with mechanism. A directed edge labeled causes is just a string until something downstream knows how to use it as a causal claim. The infrastructure to back that label—mechanism descriptions, intervention semantics, consistency with the rest of the graph—is rarely built. Without it, the directionality is decorative. You can render arrows in a UI; you cannot do counterfactual computation over them.
Negative facts. Causal reasoning depends critically on knowing what does not cause what, what blocks what, what is invariant under which interventions. Most knowledge graphs only assert positives. The closed-world assumption is then either too strong (everything not asserted is false) or too weak (nothing can be ruled out). Causal reasoning needs explicit non-edges, blockers, and invariance claims, treated as carefully as the positive edges. The absence of an edge has to mean something specific.
A knowledge graph without these three is a useful index. It is not a reasoning substrate. The distinction matters because the systems built on top will inherit the limitations: they will be able to retrieve, summarize, and search, but they will not be able to predict the consequences of an intervention or assign blame for an outcome. They will be at rung 1 of Pearl's ladder, and no amount of additional data at that rung will lift them to rung 2.
Conclusion
The sophistication of a software system is bounded above by the richness of the causal model it operates over.
A CRUD application has no causal model; it is a set of lookups and writes, and that is the ceiling on what it can do. A compiler has a deep causal model of programs, and it can therefore perform transformations that preserve meaning while changing form. An observability platform has a causal model of distributed execution, and it can therefore answer why did this happen? rather than just what happened?. A drug discovery pipeline has a causal model of biological mechanism, and it can therefore propose interventions rather than only describe correlations.
Across every domain where the questions are hard, the same shape recurs. The system has, somewhere inside it, a graph. The graph's nodes are the entities the system reasons about. The graph's edges are the relations between them. As the system becomes more sophisticated, the edges become more typed, more directional, more conditional, more mechanism-laden—until eventually the graph is not a knowledge graph in the descriptive sense but a causal graph in the interventional sense.
The right way to build for this is not to bolt a graph onto an existing system as a search optimization. It is to recognize that the causal model is the system; that the algorithms, the queries, the user interfaces are downstream of it; that getting the graph right is most of the work. When the graph is right, the rest follows. When the graph is wrong, no amount of algorithmic cleverness on top will recover what was lost in the representation.
This is why graphs—and especially causal graphs—keep being at the heart of every hard problem. They are not a fashion in data infrastructure. They are the place where structure, semantics, and mechanism meet, and that intersection is where reasoning lives.