Lehman's Laws Don't Care Who Writes the Code
Why AI-Driven Development Doesn't Solve the Maintenance Problem — It Transforms It
Software maintenance has always been the dominant cost. Estimates have consistently placed maintenance at 60–80% of total lifecycle cost for decades. The industry's response has been a long arc of attempts to reduce the cost of producing code: higher-level languages, frameworks, code generation, and now, large language models that can produce syntactically correct, locally plausible code at machine speed.
Every one of these interventions has attacked the wrong variable.
The cost of maintenance was never the cost of typing. It was the cost of understanding. And that distinction is about to matter more than it ever has, because we are approaching a world where AI doesn't just assist developers — it writes, maintains, and evolves systems autonomously.
To reason clearly about what happens next, we need an operating principle. That's where Lehman's Laws come in.
Lehman's Laws
Meir "Manny" Lehman spent decades studying how real software systems evolve. The result was a set of empirical laws. Three of them are directly relevant here.
The Law of Continuing Change. A system that is used in a real-world environment must be continually adapted, or it becomes progressively less satisfactory. This law is about the environment moving while the system stays still. Regulations change. User expectations shift. Dependencies release breaking versions. Threat surfaces expand. A system that isn't actively maintained doesn't stay the same — it decays relative to its environment.
The Law of Increasing Complexity. As a system evolves, its complexity increases unless work is done to maintain or reduce it. Every adaptation — every feature, patch, and integration — adds structural complexity. The only counterforce is deliberate, costly effort to simplify, refactor, and re-cohere the design. Without that effort, the system's architecture drifts from any coherent intent.
The Law of Declining Quality. Unless rigorously maintained and adapted to operational environment changes, the quality of a system will appear to decline. The system's absolute quality may not change, but the environment's expectations do. Quality is relative to context, and context never holds still.
These laws are not about humans. They are about systems interacting with environments. They apply regardless of who or what is doing the writing. This is the point most commentary on AI-driven development misses entirely.
Before we can reason about AI-driven maintenance, we need to be precise about what makes maintenance expensive.
The dominant cost is building and maintaining a mental model of the system's actual behavior. This incorporates what the system actually does — across all code paths, failure modes, implicit contracts, and emergent interactions between components.
This comprehension cost scales super-linearly with system complexity. A system twice as complex is not twice as hard to understand — it's four or eight times as hard, because the interactions between components grow combinatorially.
In human-maintained systems, this comprehension bottleneck manifests as:
- Onboarding cost. New engineers spend weeks or months before they can make safe changes.
- Blast radius uncertainty. Engineers avoid changes because they can't confidently bound the impact.
- Intent erosion. The original architectural decisions — the why behind the structure — get diluted across teams and time. Every patch that "works" but violates an unwritten invariant accelerates this erosion.
- Knowledge concentration. Critical understanding clusters in a few heads. When those people leave, the system becomes partially opaque to the entire organization.
Now here is the critical observation: current AI-driven development makes every one of these worse.
When a developer writes code slowly, they are forced to build a model of the system as they go. When an LLM generates two hundred lines in seconds, the code exists but the model in the developer's head often does not. The rate of code production goes up. The rate of deep understanding stays flat or declines. The delta between the two is pure future maintenance cost — Lehman's Second Law operating at accelerated speed.
But let's go further. Let's take the thought experiment seriously: AI writes, maintains, and evolves the system entirely. No human developers in the loop. What happens to Lehman's Laws?
The First Law still holds. The environment doesn't stop changing because the maintainer is an AI. Regulations still shift. User behavior still evolves. Dependencies still break. The system must still be continually adapted, and the adaptation must still be correct — meaning it must preserve the system's essential properties while modifying its behavior to match new environmental requirements.
The Second Law still holds — and may accelerate. An AI producing adaptations at machine speed is also producing complexity at machine speed. Every feature addition, every patch, every integration adds structural weight. Unless the AI is simultaneously performing the deliberate, costly work of simplification and re-coherence, entropy wins faster than it ever could with human-speed development.
The Third Law still holds. Environmental expectations continue to shift. A system that was compliant yesterday may not be compliant today. A system that was performant under yesterday's traffic patterns may not be under tomorrow's. Quality is still relative, and the target is still moving.
So the maintenance problem doesn't disappear. But it transforms. It shifts from a comprehension problem (humans struggling to understand systems) to a coherence problem (AI struggling to maintain architectural intent over time).
The Three Layers of System Knowledge
To see why this coherence problem is hard, consider three layers of knowledge required to maintain any non-trivial system:
Layer 1: The Artifact. The code, configuration, infrastructure definitions, and data schemas. This is what exists. Current AI is remarkably capable here — it can read, generate, and transform code with high syntactic accuracy.
Layer 2: The Model. The actual behavioral semantics of the system. Call graphs that account for dependency injection, reflection, and dynamic dispatch. Data flows that track information across service boundaries. Dependency maps that capture transitive risk. Behavioral contracts — not what the code says, but what the system does under specific conditions. This layer is where comprehension lives.
Layer 3: The Intent. What properties must hold. What tradeoffs are acceptable. What the system is for in a world that changes around it. This is the layer where "correct adaptation" gets defined. It requires grounding in reality outside the codebase — business context, user needs, regulatory environments, organizational strategy.
Current LLMs operate almost exclusively at Layer 1. They reason about tokens, not about system behavior. Ask an LLM to add a feature and it produces code that is locally plausible. It does not ask whether the change introduces a circular dependency in the module graph. It does not ask whether it violates a latency budget on a critical path. It does not ask whether it creates a second source of truth for a domain concept. Those are system-level reasoning tasks that require holding a global model and evaluating a local change against it.
Without Layer 2, an AI maintainer is doing what the least effective human developers do: making locally reasonable changes without understanding global consequences. The difference is that it does this at machine speed.
And without Layer 3, even a perfect Layer 2 is insufficient. The AI can know exactly what the system does but cannot determine whether what it does is still right — because "right" is defined by the world outside the system.
In human-maintained systems, intent erosion is slow. It happens over months and years as team composition changes, as institutional memory fades, as quick fixes accumulate. The slowness is actually a safety mechanism — it gives organizations time to notice the drift and intervene.
An AI maintaining a system at machine speed has no such buffer. If its model of architectural intent is incomplete or subtly wrong, it will propagate that error across hundreds of changes before anyone notices. Each change is locally plausible. The test suite passes. The metrics look fine. But the system's internal coherence is degrading, and the degradation is invisible until it crosses a threshold where changes start producing unexpected interactions.
This is Lehman's Second Law in its most dangerous form: complexity increasing at machine speed, with the corrective force (deliberate simplification) absent because the AI doesn't recognize the need for it.
There is a deeper issue. An AI system maintaining another system is, at bottom, a closed loop. It observes signals from the system (error rates, latency, test results, log patterns), decides on adaptations, implements them, and observes new signals. This is a feedback loop, and feedback loops have well-understood failure modes.
The most dangerous is drift without grounding. The AI observes that a metric is degrading, makes a change, observes improvement, and reinforces that strategy. But the metric may be a proxy for something it cannot observe. The real-world effect of the system — on users, on business outcomes, on regulatory compliance — is not fully captured by any set of metrics. It lives in the world outside the loop.
Human maintainers bring something essential to this equation: independent grounding. A human can say "this metric is improving but the user experience is getting worse because the metric doesn't capture what actually matters." A human can say "we're optimizing for the wrong thing." A human can receive a phone call from a customer, read a regulatory update, or notice a cultural shift that changes what "correct behavior" means.
An AI in a closed loop cannot do this. It can only optimize for what it can measure. And Goodhart's Law — when a measure becomes a target, it ceases to be a good measure — applies to AI maintainers just as ruthlessly as it applies to human organizations.
The fully autonomous AI maintenance dream — no humans anywhere in the loop — fails on Layer 3. Intent cannot be fully formalized for systems that exist at the boundary between software and human meaning. Products, platforms, services — anything where "correct" depends on context that lives outside the codebase — require a grounding function that connects system behavior to real-world consequence.
That grounding function is the human's actual role in the endgame. Not writing code. Not reviewing pull requests. Not even designing systems in the traditional sense. The human becomes the intent authority — the one who defines what the system is for, what properties must hold, and what tradeoffs are acceptable as the environment changes.
Notice what this implies about skill requirements. The valuable human capability in this world is not syntax mastery or framework knowledge. It is the ability to reason about what a system should do and why — systems thinking, domain understanding, judgment about tradeoffs, and the capacity to ground technical decisions in real-world consequences.
The overwhelming investment in AI-driven development today is in Layer 1 — generating more code, faster. Copilots, chat-driven development, autonomous coding agents. This investment, evaluated against Lehman's Laws, is accelerating the maintenance crisis:
- More code produced per unit time (Layer 1) without proportional investment in understanding what that code does in aggregate (Layer 2) guarantees increasing complexity (Lehman's Second Law).
- Faster adaptation without verified coherence guarantees declining quality relative to intent (Lehman's Third Law).
- Higher velocity without grounding in environmental change guarantees systems that are optimized for yesterday's context (Lehman's First Law).