TOON and the Tooling Tax: Why Software Never Learns

Stackshala

09 Nov 2025 — 14 min read

Every few months, someone invents a new way to talk to AI that promises to save you money.

The latest is something called TOON which stands for Token-Oriented Object Notation. If you work with large language models like ChatGPT or Claude, you might have seen it mentioned. It's a new data format, kind of like JSON (the standard way computers exchange structured data), but more compact. The pitch is simple: use TOON instead of JSON and you'll use 30-60% fewer tokens when sending data to an AI.

If that sentence made no sense to you, don't worry. That's actually the point I'm making.

TOON is clever. It works. And the fact that it needs to exist at all tells us something important about how badly we're building AI systems and how this same mistake has been made over and over again in software for the past 40 years.

To understand why TOON exists, you need to understand how AI companies charge you for their services.

When you use ChatGPT, Claude, or any similar AI service, you're not paying for queries or conversations in the way you might expect. Instead, you're paying for something called tokens.

What are tokens?

Think of tokens as the way AI systems chop up text into digestible pieces. When you send a message to an AI, it doesn't read your words the way you wrote them. Instead, it breaks everything down into fragments, usually chunks of about 3-4 characters each. The word hello might become one token. The word understanding might become two or three tokens: "under", "stand", "ing".

This chopping-up process is called tokenization, and it's actually necessary for how these AI systems work. The AI model was trained by learning patterns across billions of these token fragments, and that's how it learned to generate human-like text.

So far, so good. That's just an internal technical detail, right?

The sad part: AI companies decided to charge you based on these tokens.

Every message you send gets tokenized, and you pay per token. Every response the AI generates is also tokenized, and you pay for those tokens too. If you have a conversation that uses 10,000 tokens, you might pay a few cents. A complex analysis that uses 200,000 tokens might cost a few dollars.

The billing formula is simple: more tokens = more money.

And that's where TOON comes in.

Because if you're paying per token, suddenly you care a lot about how many tokens your data uses. If you're sending structured information to an AI like a spreadsheet of sales data, or a list of customer records, the format you choose matters.

Regular JSON (the standard format) might use 10,000 tokens to represent your data. TOON can represent the same information in 5,000 tokens. Same data, half the cost.

So developers started inventing new formats like TOON to game the system. To pack data more efficiently. To reduce token counts.

And my argument is that this is completely backwards.

The Repeating Pattern

Let talk about the internet to start with.

When computers talk to each other over the internet, they break data into small chunks called packets. This is necessary because you can't send a whole file at once. It needs to be broken up, sent piece by piece, and reassembled at the other end.

The number of packets matters a lot for network performance. More packets mean more overhead, more chances for things to go wrong, more processing needed.

But you never think about packets when you use the internet.

When you load a webpage, you don't wonder - how many packets will this take?You don't optimize your browsing to reduce packet counts. You don't pay your internet provider based on packets sent.

That's because the internet was designed with something called protocol layers. At the bottom layer, data gets broken into packets, routed through networks, and reassembled. But at the top layer = the one you interact with, you just see webpages, emails, and videos.

The technical complexity is hidden from you.

The same thing happens with video streaming. When you watch Netflix, the video is broken into thousands of individual frames per second. The bit rate adjusts based on your connection speed. Frames are compressed, decoded, buffered.

But you never think about frames. You just press play and watch.

Or consider databases. When you store data in a database, it gets organized into storage blocks on a hard drive. The database constantly manages these blocks, moves data around, optimizes access patterns.

But when you use a database, you write SQL queries. You think in terms of tables and rows, not storage blocks.

This is how mature technology works: the messy internal details are hidden behind clean interfaces.

LLMs Broke This Pattern

With large language models, something went wrong.

Tokens aren't just an internal detail. They're the main thing you have to think about when using these systems.

Want to use an AI API? You need to:

Understand what tokens are
Count your tokens before sending requests
Optimize your prompts to use fewer tokens
Monitor your token usage
Choose between models based on token limits
Pay attention to "context windows" measured in tokens

There are entire tools and libraries just for counting tokens. People write blog posts about token-efficient prompting techniques. Companies hire engineers who specialize in token optimization.

Imagine if Netflix made you think about video frame counts. Or if Google charged you per indexed search term. Or if your phone bill itemized every cellular tower handoff.

The internal technical detail became the external user interface.

This Is A Design Failure

Let me be clear about what I'm saying.

I'm not arguing that tokenization is wrong. It's a necessary part of how these AI systems work. Models need to break text into pieces to process it.

What's wrong is exposing this internal mechanism to users and tying billing directly to it.

Here's an analogy: imagine you're building a house, and the architect shows you the blueprint. There are studs in the walls, joists in the floor, and rafters in the roof. These are necessary structural elements. The house would collapse without them.

But now imagine the architect says: We're going to leave all the studs exposed. No drywall. No paint. And by the way, we're charging you per stud. If you want to reduce costs, you should design your floor plan to minimize stud count.

You'd think that was insane. The studs are necessary, but they should be hidden behind finished walls. The house should be designed for how you want to live in it, not for minimizing structural elements.

That's what AI companies did with tokens.

They took an internal implementation detail, exposed it directly to users, and made it the basis for billing. And now everyone is optimizing around it.

The Tooling Tax Begins

Once this mistake was made, it couldn't be easily undone.

Millions of developers started building applications on these token-based APIs. The billing model became standard across OpenAI, Anthropic, Google, and others. The entire ecosystem locked in around tokens.

And when a fundamental architectural choice is wrong but locked in, you don't get a fix. You get tools to work around it.

This is where TOON comes in, and it's just the beginning.

Right now, we're seeing:

New data formats to reduce token counts (TOON, JSONC, compact encodings)
Prompt compression libraries that shrink your text before sending it
Context window optimizers that pack more information into fewer tokens
Token counting tools integrated into every AI development framework
Best practices guides on writing token-efficient prompts

Five years from now, we'll likely see:

Entire companies built around token optimization platforms
Enterprise solutions for token budget management
Consultants who specialize in token efficiency
Conference talks on advanced token reduction techniques
This becoming standard knowledge that every one just has to know.

This is the tooling tax.

When the architecture is wrong, you don't fix the architecture. You build layer after layer of tools to live with the wrongness. And eventually, the tools become the solution. The complexity becomes normal. The workarounds become best practices.

We've Seen This Before

This exact pattern has played out multiple times in software history. Let me give you a few examples.

Email and Spam

When email was invented in the 1970s, it was designed for a trusted network of researchers. Anyone could send email claiming to be anyone else. There was no authentication, no verification.

This worked fine until email became popular. Then spammers exploited the lack of authentication, and suddenly everyone's inbox was flooded with junk.

The right fix would have been to redesign email with authentication built in from the start. But by then, millions of email servers were running the old protocol. You couldn't just change it.

So instead, we got:

Spam filters (that constantly play cat-and-mouse with spammers)
SPF, DKIM, and DMARC (complex authentication systems bolted on later)
Reputation systems and blocklists
Machine learning spam detection
An entire email security industry

It's been 40 years. We still have spam. The architectural flaw was never fixed, we just built an enormous tooling ecosystem to live with it.

JavaScript and Type Safety

When JavaScript was created in 1995, it was thrown together in 10 days for simple web page interactions. The language had some fundamental flaws, weird type coercion rules, confusing scoping, global namespace pollution.

But JavaScript became the only language that runs in web browsers. Billions of lines of JavaScript code exist. You can't just replace it.

So instead, we got:

Linters (tools to catch common mistakes)
Transpilers (tools to convert newer syntax to older syntax)
TypeScript (essentially a whole new language that compiles to JavaScript)
Entire books about the "good parts" and "bad parts" of JavaScript
Complex build pipelines to work around language limitations

Modern web development is 50% actually building features and 50% managing JavaScript tooling. All because the initial design was rushed.

IPv4 and Address Exhaustion

When the internet protocol was designed in 1981, they allocated 32 bits for addresses. That's about 4.3 billion possible addresses. At the time, that seemed like more than enough.

But by the late 1990s, it became clear we'd run out. The right fix was to switch to IPv6, which uses 128-bit addresses and has room for trillions of trillions of devices.

IPv6 was designed in 1995. It's now 2025: 30 years later and we still haven't fully switched over.

Instead, we got:

NAT (Network Address Translation) - a complicated hack to share one address among multiple devices
Complex dual-stack systems running IPv4 and IPv6 simultaneously
Carrier-grade NAT and other increasingly desperate patches
An entire industry around IPv4 address allocation and management

The core problem was never fixed. We just built workarounds on top of workarounds.

The Pattern

In every case:

An initial design decision was wrong or shortsighted
The technology was adopted widely before the flaw became obvious
Fixing it would require breaking changes that were too costly
Instead, an ecosystem of tools emerged to work around the problem
The workarounds became permanent
New developers learn - this is just how it is

And now we're doing it again with AI and tokens.

But AI Is New - Give It Time

This is where I get frustrated.

The AI is a new industry excuse doesn't hold. Because the lessons we needed were already learned decades ago.

When LLM companies started building APIs in 2020, they had access to:

50 years of networking design principles (TCP/IP from 1974 showed us how to do protocol layering)
30 years of web architecture (HTTP from 1991 showed us abstraction and content negotiation)
20 years of REST API design (from 2000 showed us resource-oriented interfaces)

The entire software industry knew how to hide implementation details behind clean abstractions.

Yet AI companies chose to expose tokenization directly to users and tie billing to it. They created the exact problems that networking solved in the 1970s.

This wasn't an innocent mistake. It was a choice, likely driven by:

Speed to market (easier to bill on tokens than build proper abstractions)
First-mover advantage (get users locked in before competitors)
Simple metering (tokens are easy to count and charge for)

But fast and simple isn't the same as correct.

And now we're paying the price. Just like we paid the price for rushing email, rushing JavaScript, rushing IPv4.

What The Right Design Would Look Like

Let me paint you a picture of how this could have been done properly.

Imagine if AI companies had launched with APIs that looked like this:

Send Request:
- Your data (in whatever format is natural: text, JSON, CSV, whatever)
- What you want done with it
- Quality level you need (fast vs. thorough)

Receive Response:
- The result
- Time it took
- Cost based on compute used

In this model:

You send data in formats that make sense for your use case
The system automatically figures out the most efficient way to process it
Compression and optimization happen transparently
You pay for the actual computational work done, not for internal implementation details
Token limits and context windows are abstracted into complexity or scope limits

You'd never think about tokens. Just like you never think about packets when browsing the web.

The system would handle all of that internally. It would negotiate the best encoding, apply compression where useful, manage context efficiently. All the things that users are now manually doing with tools like TOON would happen automatically.

This isn't science fiction. This is how mature systems work. It's how:

HTTP negotiates content types and compression (your browser and web servers figure out the best format automatically)
Databases optimize query execution (you write SQL, the database figures out how to execute it efficiently)
Video streaming adapts quality (Netflix adjusts bitrate based on your connection, you just see smooth playback)

The technology to do this properly existed before LLMs were built.

The fact that it wasn't done this way is a choice, not a necessity.

The Cost Of Bad Architecture

This isn't just an inconvenience. The consequences are serious and long-lasting.

Developer Time

Right now, developers building AI applications spend significant time on token optimization. Time that could be spent building features. Solving real problems. Creating value for users.

Instead, they're:

Manually counting tokens
Rewriting prompts to be shorter
Chunking documents to fit context windows
Choosing between models based on token limits
Debugging token-related errors
Learning arcane token-counting rules

This is waste. Pure waste. Like if Netflix engineers spent half their time teaching users about frame rates and codecs.

Innovation At The Wrong Layer

When the architecture is wrong, innovation happens at the wrong level.

We're seeing clever new data formats (TOON), compression techniques, context management strategies. These are genuinely innovative, but they're innovations to work around a problem that shouldn't exist.

Imagine if all that creativity went into building better AI applications instead of optimizing token usage.

Accumulated Complexity

Every workaround adds complexity. Every tool adds dependencies. Every best practice adds tribal knowledge that new developers have to learn.

Five years from now, onboarding an AI developer will include:

Here's how tokens work
Here's how to count them
Here are the five most popular token optimization techniques
Here are the three competing token management libraries
Here's when to use TOON vs JSONC vs other formats

All of this complexity stems from one initial design decision.

Lock-In Effects

The longer the current model persists, the harder it becomes to change.

Right now, thousands of companies are building applications on token-based APIs. Training materials teach token optimization. Tools are built around token management. Billing systems are designed for token pricing.

In five years, there will be even more investment in this ecosystem. Even more reason not to change.

Just like how email spam filters are a multi-billion dollar industry now, creating economic incentive to keep the broken email architecture. Just like how TypeScript is now deeply embedded in JavaScript ecosystems, making it nearly impossible to fix JavaScript itself.

Bad architecture becomes permanent because changing it becomes more expensive than living with it.

The Core Lesson

Here's what I want you to take away from this:

When you need tools to optimize around your system's billing model, your architecture has failed.

TOON is brilliant within the constraints of a broken system. But the fact that we need it at all is damning.

Mature systems don't force users to invent workarounds at the protocol layer. They provide proper abstractions. They hide complexity. They let users focus on their actual goals, not on implementation details.

The internet doesn't need packet-efficient HTTP. Databases don't need block-optimized SQL. Video streaming doesn't needframe-aware viewing.

AI systems shouldn't need token-efficient data formats.

When primitives leak through abstractions, tooling industries emerge. When abstractions are done right, complexity disappears.

Why Software Keeps Making This Mistake

At this point, you might be wondering: if this pattern is so obvious, why does it keep happening?

I think there are several reasons:

1. Speed Beats Thoughtfulness

In the tech industry, there's enormous pressure to ship fast. Move fast and break things became a mantra. First-mover advantage is real - the first company to market often wins, even with inferior technology.

So companies cut corners. They expose implementation details because it's faster than building proper abstractions. They tie billing to easy-to-meter units because it's simpler than building sophisticated pricing models.

The technical debt compounds, but by then you have users and revenue. And fixing it would mean breaking changes that could lose customers.

2. Tooling Is Profitable

Here's an uncomfortable truth: broken architecture creates business opportunities.

When email was broken, companies like MessageLabs and Postini built profitable businesses selling spam filtering. When JavaScript was messy, companies built on TypeScript and developer tools.

There's economic incentive to keep the architecture broken, because the tooling ecosystem is worth billions of dollars.

Nobody wants to fix the fundamental problem if they're making money selling workarounds.

3. Knowledge Doesn't Transfer

Each new domain in tech acts like it's the first to encounter these problems.

Web developers learned lessons about abstraction. Database designers learned lessons about query optimization. Network engineers learned lessons about protocol design.

But when AI came along, many of those lessons were ignored. Different people, different companies, different expertise. The knowledge didn't transfer.

It's like every generation of software engineers has to relearn that touching a hot stove burns.

4. Backwards Compatibility Is A Prison

Once you ship an API, you're stuck with it. Breaking changes mean angry customers, migration costs, lost business.

So even when you realize the design is wrong, you can't easily fix it. You can add new APIs alongside the old ones (like IPv6 alongside IPv4), but you can't force everyone to switch.

The wrong design becomes permanent because the switching costs are too high.

5. We Optimize Locally, Not Globally

When faced with an architectural problem, the natural response is to build a tool to work around it. That solves the immediate pain.

But it doesn't fix the root cause. And over time, all these local optimizations create a complex ecosystem that's even harder to change than the original problem.

We optimize ourselves into a corner.

What Happens Next

One of two things will happen:

Option A: The Rewrite Someone (maybe OpenAI, maybe a competitor) launches a fundamentally different API that abstracts tokens away. It's cleaner, simpler, better. But migration is painful and slow. Some companies never migrate, running legacy token-based systems indefinitely. The industry splits between legacy AI and modern AI.

Option B: Permanent Scar Tissue The token-based model becomes so entrenched that we never switch away. Like QWERTY keyboards or month/day/year date formats - we know it's suboptimal, but the switching costs are too high. Token optimization becomes just how AI works and new developers learn it without questioning why.

Based on history, Option B is more likely. We still fight email spam. We still use JavaScript. We still haven't fully switched to IPv6.

Broken architecture tends to become permanent architecture.

Conclusion

This essay isn't really about TOON, or even about AI.

It's about a pattern I've watched repeat throughout my career in software. A pattern where:

Initial design decisions are made quickly, without full consideration
Those decisions get locked in through adoption
The flaws become apparent
Instead of fixing the architecture, we build tools to live with it
The tools become an industry
The complexity becomes normal
New developers learn - this is just how it is
The cycle continues

Software, as an industry, has a learning disability.

We keep making the same architectural mistakes. We keep building tooling ecosystems around broken fundamentals. We keep choosing short-term convenience over long-term correctness.

And every time, we tell ourselves - this time is different or - we'll fix it later or - it's good enough for now.

But later never comes. Good enough becomes permanent. Different looks remarkably the same.

TOON is clever. It solves a real problem. The developers who created it should be proud of their work.

But the fact that TOON needs to exist is an indictment of the current state of AI system design.

We had decades of lessons about protocol abstraction, about hiding complexity, about separating interface from implementation. We chose to ignore those lessons in favour of quick-to-market, easy-to-meter APIs.

And now we're building a tooling ecosystem to live with that choice.

Just like we've done before. Just like we'll probably do again.

The question is: when will we learn?

When will we stop accepting broken architecture as inevitable? When will we demand better design upfront, rather than clever workarounds after the fact? When will we treat architectural thinking as seriously as we treat coding, testing, and deployment?

I don't have good answers. But I know that recognising the pattern is the first step.

TOON is just the latest chapter in a very old story. A story about how hard it is to get architecture right, and how much we pay when we get it wrong.

The story isn't over. We're still writing it.

The question is whether we'll write a different ending this time.

Maneesh Chaturvedi — 25+ years building software systems and watching the industry repeat the same mistakes. Founder of Stackshala Technologies, teaching engineers to think in first principles rather than memorizing patterns. Because maybe if enough of us understand these cycles, we can finally break them.