Refactoring is About Causality, Not Just Behavior Preservation
Martin Fowler's influential definition of refactoring has guided software engineers for decades: "Refactoring is the process of changing a software system so it does not alter the external behavior of the code but improves its internal structure."
It's a clean, safe definition and also incomplete.
I want to propose a more fundamental framing: Refactoring is the activity of making causality - cause and effect obvious in code. It's a shift that better captures what we're actually trying to achieve when we refactor, and it leads to different, often better, engineering decisions.
Let discuss why.
The Limits of Behavior Preservation
Fowler's definition rests on a conservative principle: preserve all external behavior while improving internal structure. This works well for mechanical, low-risk refactorings. But it has serious limitations in real-world engineering:
Tests aren't truth, they're approximations
Fowler's emphasis on "external behavior" treats tests and APIs as the ultimate source of truth. But tests can be incomplete. They can encode accidental behaviors. They can preserve bugs.
I've seen countless codebases where "behavior preservation" meant perpetuating a silent failure mode because some test somewhere depended on it. The refactoring that would make the system more understandable, making the failure explicit and fail-fast was ruled out because it "changed behavior."
Structure for its own sake misses the point
"Improve internal structure" sounds like a purely aesthetic exercise: renaming variables, extracting methods, shuffling code between files. These are fine mechanical improvements, but they miss refactoring's deeper purpose.
Code is prose. It's communication. The question isn't just "is this organized nicely?" but "does this show me why something happens?" Two codebases can have identical internal structure but wildly different clarity about cause and effect.
It discourages beneficial changes
Because "no external behavior change" is the primary constraint, Fowler's framing - when taken literally by engineering teams, often discourages valuable changes that would alter some observable effects. Introducing proper invariants, reordering initialization to be explicit, consolidating error handling - these might alter observable timing or error messages, but they vastly improve predictability and reasoning.
When behavior-preservation is the north star, these improvements get labeled as "not refactoring," even when they're pure wins for code clarity and safety.
Why Causality is the Better North Star
When we say "make causality obvious," we're centering refactoring around what actually matters: can a developer understand why something happens?
Causality equals mental models
If cause leads to effect explicitly in the code, developers can form accurate mental models quickly. They can reason about changes correctly. They encounter fewer surprises during maintenance.
This is the whole point. We don't refactor to make code pretty. We refactor to make it understandable.
Intent over shape
A causality-first approach values names that reveal purpose, sequencing that shows order of operations, explicit state transitions, and clear dataflow. These reveal intent, which is far more valuable than minimizing syntactic changes.
Better fault isolation
Within a bounded context or module, when you can read a function and say "this alone causes X given Y," debugging becomes local and cheap. You don't need to trace through six layers of indirection. The mechanical benefit of causal clarity is faster problem-solving within the scope where you can reason locally about cause and effect.
It scales beyond micro-refactorings
Causality clarity works at every level: methods, modules, components, systems. Fowler's definition is often interpreted narrowly, as applying only to small, safe edits. But architectural clarity follows the same principle - make the causal relationships in your system obvious.
Where the Definitions Diverge in Practice
Let me give you concrete scenarios where these two approaches lead to different decisions:
Scenario 1: Implicit initialization order
Fowler's approach: Don't change external behavior. Reordering initialization is risky because timing might change.
Causality-first approach: Make initialization explicit through builder patterns or dedicated init methods. This might change startup timing, but it removes entire classes of bugs by making the causal chain unambiguous.
Scenario 2: Silent failure vs fail-fast
Fowler's approach: Changing how errors surface alters behavior, so it's not a refactor.
Causality-first approach: Surface failures where the causal chain is clearer. If making an error explicit increases understandability, do it, even if it's "externally observable."
Scenario 3: Consolidating duplicated logic
Both approaches agree this is valuable. But causality-first pushes further: pick the name that reveals the causal relationship. Not computeA(), but calculateTaxForEventTrigger().
Where the Approaches Conflict: Fowler's Own Examples
Let me use Fowler's actual refactoring examples from his original 1999 book to show where behavior-preservation conflicts with causality-clarity. All code is from Chapter 1's video store example in Java.
Example 1: Replace Temp with Query (the performance concern)
This is the most revealing example from Fowler's book, where he explicitly acknowledges the causality cost.
Fowler's refactoring:
// Before - calculate once
public String statement() {
double totalAmount = 0;
Enumeration rentals = _rentals.elements();
String result = "Rental Record for " + name() + "\n";
while (rentals.hasMoreElements()) {
Rental each = (Rental) rentals.nextElement();
totalAmount += each.charge();
result += "\t" + each.tape().movie().name() + "\t" +
String.valueOf(each.charge()) + "\n";
}
result += "Amount owed is " + String.valueOf(totalAmount) + "\n";
return result;
}
// After - loop three times (Fowler: "don't worry about performance")
public String statement() {
Enumeration rentals = _rentals.elements();
String result = "Rental Record for " + name() + "\n";
while (rentals.hasMoreElements()) {
Rental each = (Rental) rentals.nextElement();
result += "\t" + each.tape().movie().name() + "\t" +
String.valueOf(each.charge()) + "\n";
}
result += "Amount owed is " + String.valueOf(charge()) + "\n";
result += "You earned " + String.valueOf(frequentRenterPoints()) +
" frequent renter points";
return result;
}
private double charge() {
double result = 0;
Enumeration rentals = _rentals.elements();
while (rentals.hasMoreElements()) {
Rental each = (Rental) rentals.nextElement();
result += each.charge(); // Loop again!
}
return result;
}
private int frequentRenterPoints() {
int result = 0;
Enumeration rentals = _rentals.elements();
while (rentals.hasMoreElements()) {
Rental each = (Rental) rentals.nextElement();
result += each.frequentRenterPoints(); // And again!
}
return result;
}
Fowler's reasoning: The old code executed the while loop once, the new code executes it three times. If the while loop takes time, this might significantly impair performance. Many programmers would not do this refactoring simply for this reason. While some loops do cause performance issues, most do not. So while refactoring don't worry about this.
Note the words if and might - Fowler is arguing against premature optimization.
The causality problem: The code now obscures that charge() is computed once per rental for display, then computed again for each rental in the total. The causal relationship - "we already calculated these values" is hidden. Fowler acknowledges: "That's because Java requires a lot of statements to set up a summing loop... it is noise that hides what the intent of the loop is."
Causality-first alternative:
public String statement() {
// Calculate rental data once - explicit causality
Vector rentalData = new Vector();
Enumeration rentals = _rentals.elements();
while (rentals.hasMoreElements()) {
Rental each = (Rental) rentals.nextElement();
RentalData data = new RentalData(
each.tape().movie().name(),
each.charge(),
each.frequentRenterPoints()
);
rentalData.addElement(data);
}
String result = "Rental Record for " + name() + "\n";
Enumeration data = rentalData.elements();
while (data.hasMoreElements()) {
RentalData each = (RentalData) data.nextElement();
result += "\t" + each.movieName + "\t" +
String.valueOf(each.charge) + "\n";
}
result += "Amount owed is " + String.valueOf(totalCharge(rentalData)) + "\n";
result += "You earned " + String.valueOf(totalPoints(rentalData)) +
" frequent renter points";
return result;
}
private double totalCharge(Vector rentalData) {
double result = 0;
Enumeration data = rentalData.elements();
while (data.hasMoreElements()) {
RentalData each = (RentalData) data.nextElement();
result += each.charge;
}
return result;
}
Why causality wins: Now it's explicit. Each rental undergoes one evaluation that produces its data, then those computed values flow into display and totals. The causal relationship one rental -> one calculation -> multiple uses is visible.
Yes, this introduces temporal coupling, data must be computed before it's used. But that's explicit temporal coupling, which is precisely the point. The code structure now matches the actual causal dependency, making it impossible to accidentally use data before it's ready.
Example 2: The Switch Statement on Price Code
Fowler's canonical example:
double charge(int daysRented) {
double result = 0;
switch (priceCode()) {
case Movie.REGULAR:
result += 2;
if (daysRented > 2)
result += (daysRented - 2) * 1.5;
break;
case Movie.NEW_RELEASE:
result += daysRented * 3;
break;
case Movie.CHILDRENS:
result += 1.5;
if (daysRented > 3)
result += (daysRented - 3) * 1.5;
break;
}
return result;
}
Fowler's approach: Preserve this behavior exactly during refactoring. The switch statement must handle all cases that come in. Don't change what happens with unknown price codes.
The causality problem: What happens when priceCode() returns a value that's not in the switch? The function returns 0 silently. The causal relationship unknown movie type -> zero charge is hidden. A bug could go unnoticed for years.
Causality-first approach:
double charge(int daysRented) {
switch (priceCode()) {
case Movie.REGULAR:
return chargeRegular(daysRented);
case Movie.NEW_RELEASE:
return chargeNewRelease(daysRented);
case Movie.CHILDRENS:
return chargeChildrens(daysRented);
default:
throw new IllegalStateException(
"Unknown price code: " + priceCode()
);
}
}
private double chargeRegular(int daysRented) {
double result = 2;
if (daysRented > 2)
result += (daysRented - 2) * 1.5;
return result;
}
Why this matters: Fowler's version hides the causal truth, it can't handle all price codes. Adding the default case changes external behavior (now throws instead of returning 0), violating his definition. But it reveals the true causality: unknown price code -> error.
Example 3: Extract Method - The Temp Variable
Fowler's starting point:
public String statement() {
double totalAmount = 0;
int frequentRenterPoints = 0;
Enumeration rentals = _rentals.elements();
String result = "Rental Record for " + name() + "\n";
while (rentals.hasMoreElements()) {
double thisAmount = 0;
Rental each = (Rental) rentals.nextElement();
// determine amounts for each line
switch (each.tape().movie().priceCode()) {
case Movie.REGULAR:
thisAmount += 2;
if (each.daysRented() > 2)
thisAmount += (each.daysRented() - 2) * 1.5;
break;
// ... other cases
}
totalAmount += thisAmount;
result += "\t" + each.tape().movie().name() + "\t" +
String.valueOf(thisAmount) + "\n";
}
result += "Amount owed is " + String.valueOf(totalAmount) + "\n";
return result;
}
Fowler's refactoring:
public String statement() {
// ... setup
while (rentals.hasMoreElements()) {
double thisAmount = 0;
Rental each = (Rental) rentals.nextElement();
thisAmount = amountOf(each);
totalAmount += thisAmount;
result += "\t" + each.tape().movie().name() + "\t" +
String.valueOf(thisAmount) + "\n";
}
// ...
}
private double amountOf(Rental aRental) {
double result = 0;
switch (aRental.tape().movie().priceCode()) {
// ... switch statement
}
return result;
}
Then Fowler removes the temp:
while (rentals.hasMoreElements()) {
Rental each = (Rental) rentals.nextElement();
totalAmount += each.charge();
result += "\t" + each.tape().movie().name() + "\t" +
String.valueOf(each.charge()) + "\n"; // Calls charge() again
}
Fowler says: I like to get rid of temporary variables... Of course there is a small performance price to pay, here the charge is now calculated twice.
The causality problem: The intermediate variable thisAmount actually captured important causality - "this value is used in two places." Removing it makes the code call charge() twice, obscuring that these are the same calculation.
Causality-first approach:
while (rentals.hasMoreElements()) {
Rental each = (Rental) rentals.nextElement();
double charge = each.charge(); // Calculate once, use twice
totalAmount += charge;
result += "\t" + each.tape().movie().name() + "\t" +
String.valueOf(charge) + "\n";
}
Why this matters: The temp variable makes the causal relationship explicit: "one calculation feeds two uses." Fowler's approach of removing it makes behavior-preservation harder to verify, you have to check that charge() is idempotent and returns the same value.
Fowler would counter: "Idempotent query methods make this safe - charge() will always return the same value." That's true mechanically. But the causality-first view values semantic clarity over trust in idempotence. The temp makes the single-calculation-multiple-uses pattern visible in the code structure itself, not just guaranteed by method contracts.
Example 4: The Frequent Renter Points Extraction
Fowler's intermediate step:
public String statement() {
int frequentRenterPoints = 0;
// ... setup
while (rentals.hasMoreElements()) {
Rental each = (Rental) rentals.nextElement();
frequentRenterPoints += frequentRenterPointsFor(each);
// ...
}
// ...
}
int frequentRenterPointsFor(Rental each) {
if ((each.tape().movie().priceCode() == Movie.NEW_RELEASE)
&& each.daysRented() > 1)
return 2;
else
return 1;
}
But Fowler's book shows an earlier version during extraction:
int frequentRenterPointsFor(Rental each) {
// add frequent renter points
frequentRenterPoints++;
// add bonus for a two day new release rental
if ((each.tape().movie().priceCode() == Movie.NEW_RELEASE)
&& each.daysRented() > 1)
frequentRenterPoints++;
}
Fowler says: Again we look at the use of locally scoped variables... frequentRenterPoints does have a value beforehand. The body of the extracted method doesn't read the value, however, so we don't need to pass it in as a parameter as long as we use an appending assignment.
The causality problem: This intermediate step has a method called frequentRenterPointsFor() that sounds like a query but actually has side effects - it mutates the parent scope. The name suggests "calculate points for this rental" but the causal effect is "increment the counter."
Causality-first approach: Skip the side-effect step entirely:
while (rentals.hasMoreElements()) {
Rental each = (Rental) rentals.nextElement();
frequentRenterPoints += calculateFrequentRenterPoints(each);
}
private int calculateFrequentRenterPoints(Rental rental) {
if ((rental.tape().movie().priceCode() == Movie.NEW_RELEASE)
&& rental.daysRented() > 1)
return 2;
else
return 1;
}
Why this matters: The causal relationship is immediately clear - each rental produces a point value, those values are summed. Fowler's intermediate step temporarily makes causality worse (function name suggests query, body does mutation) in service of behavior preservation.
Guardrails for Causality-First Refactoring
A purely causality-focused approach needs practical constraints. "Make causality obvious" is a north star, but it shouldn't become an excuse for breaking things carelessly.
This approach sometimes means changing external behavior: making errors explicit, reordering operations, surfacing preconditions which expands beyond Fowler's strict definition. That's intentional. But with that expanded scope comes responsibility:
Distinguish local clarity from global invariants. A change that makes local causality obvious but violates system-wide invariants is harmful. Always verify you're not breaking essential contracts.
Treat tests as specs of intended causality, not fossils. When a refactor reveals that a test encodes accidental behavior, that's a signal to update the spec, but do it consciously and deliberately.
Quantify what regressions actually matter. If you change how something manifests externally, ask: does this violate contracts relied upon by external systems? If yes, coordinate the change. If not, and clarity improves significantly, proceed.
A Practical Checklist
Here's how to apply causality-driven refactoring in your daily work:
- Can you point to a single line and explain why it causes the observed effect? If not, refactor.
- Prefer names that encode causal roles. Verbs for actions that produce effects, nouns for state holders.
- Reduce implicitness. Replace global state and hidden side effects with explicit parameters or clear boundaries.
- Collapse unnecessary indirection. If a layer masks cause-effect without adding value, remove it.
- Make temporal order explicit. Use clear state transitions or lifecycle methods rather than relying on constructor side effects.
- Enforce invariants near the mutation site. Prefer local checks over distant assumptions.
The Trade-offs
I'm not claiming this approach is cost-free:
Increased coupling risk. Making cause-effect explicit might expose internals. Guard against this with clear API boundaries.
Compatibility challenges. Changing behavior requires coordination. But being explicit about changes is better than silently preserving accidental complexity.
Subjectivity. What's obvious depends on reader experience and domain context. Mitigate this with shared conventions and documentation.
Conclusion: Purpose Over Process
Fowler's definition is valuable as a safe, conservative guideline for mechanical refactorings. It's test-friendly and low-risk. But it mistakes the means for the end.
The objective of refactoring isn't prettier internals or preserved behavior. It's understandability, and the most critical kind of understanding is causal understanding.
When developers can see why something happens, they can:
- Reason about changes correctly
- Debug problems efficiently
- Evolve the system confidently
- Onboard faster
- Make fewer mistakes
That's what matters. That's what we should optimize for.
So here's the reframe: use refactoring to make cause and effect visible. Make the causal chains in your system obvious. Do that, and the code becomes not just structurally sound, but genuinely understandable.
Everything else follows from that.
The Two Perspectives Compared
Here's a direct comparison of how these two philosophies approach key aspects of refactoring:
| Aspect | Fowler's Perspective (Behavior-Preservation) | Causality-First Response (Understanding-Oriented) |
|---|---|---|
| Purpose of Definition | Refactoring is about improving internal structure without changing what the system does. This provides a safety boundary so developers can confidently clean code without fear of regressions. | Refactoring is about making cause and effect explicit. Preserving correctness matters, but understanding why behavior exists is what makes systems maintainable. Behavior-preservation is a subset of causal clarity. |
| Role of Tests | Tests define external behavior; if they still pass, your refactor is safe. | Tests are approximations of intent, not absolute truth. A causality-first refactor might break a test that encodes accidental behavior-that's a feature, not a bug. |
| Risk Management | The "no behavior change" rule separates refactoring (safe) from feature work (risky). This clear line prevents chaos. | The separation is artificial. Real refactors often surface hidden behavior or implicit contracts. Safety doesn't come from freezing behavior—it comes from making cause-effect relationships explicit and testable. |
| Complexity Focus | Internal structure (naming, duplication, class organization) is what causes long-term complexity. Refactoring improves these structures. | Structural clarity without causal clarity is cosmetic. A perfectly structured system can still hide why it behaves as it does. Complexity is reduced when causal flow is obvious. |
| Performance Trade-offs | Behavior preservation sometimes means sacrificing small optimizations (e.g., extra loops). The gain is simpler, more uniform structure. | Causality-first isn't about performance either—it's about fidelity of meaning. If recomputation hides an existing causal chain ("we already calculated this"), the simplicity is misleading, not helpful. |
| Error Handling | Preserve behavior even for "undefined" cases—don't change what happens unless explicitly required. | That's how silent failures persist. Making errors explicit (fail-fast) changes behavior but reveals truth. Causality demands we surface broken assumptions rather than preserve them. |
| Philosophy of Refactoring | Refactoring is about what the program does not change. It's a controlled internal improvement. | Refactoring is about what the developer can now see. It's an epistemic improvement—code that exposes its own logic. |
| End Goal | Stable correctness through safe structure. | Sustainable understanding through visible causality. Once understanding is preserved, correctness follows naturally. |