Why generation speed is the wrong optimization

After my recent chalk talk on AI-native development, a tech lead approached me with a confession that’s become increasingly common: “Claude Code is really cool. But we have thousands of lines of AI-generated specs and code that nobody fully understands. We’re afraid to delete any of it because the AI generated it—it might be important. But we can’t manage it all either.”
This isn’t an isolated problem. I’m seeing it across enterprise teams adopting AI-assisted development. The promise was simple: AI generates better quality code than the average engineer, so let’s generate as much as possible. The reality? Individual code quality has improved, but overall application quality is degrading. Systems are spinning out of control, buried under mountains of AI-generated artefacts that teams lack the confidence to validate or the courage to delete.
We’ve created a new asymmetry in software development. Generation has become trivially easy. Verification has become the critical bottleneck.
When Creation Becomes Too Easy Link to heading
For decades, the primary constraint in software development was creation. Writing code, designing systems, documenting specifications—these activities consumed most of our time and required specialised expertise. We optimised everything around making creation faster: better IDEs, code generators, frameworks, libraries.
AI has obliterated this constraint. An LLM can generate thousands of lines of code in minutes. It can produce comprehensive design documents, detailed specifications, and extensive test suites faster than a human can read them. The bottleneck has shifted entirely.
But human review capacity hasn’t scaled proportionally. We still read at the same speed. We still need time to understand context, evaluate trade-offs, and build mental models of systems. The asymmetry creates a dangerous gap: we can generate far more than we can meaningfully verify.
This gap manifests in two critical ways. First, there’s a confidence gap - we don’t trust what we didn’t create. When you write code yourself, you understand its assumptions, limitations, and edge cases. When AI generates it, you’re reviewing someone else’s work without the context of the creation process. Second, there’s a connection gap - we lack ownership of AI-generated artefacts. The code exists, it might even work, but nobody on the team feels responsible for it or deeply understands it.
Teams respond by rushing to generate more, assuming that velocity in creation equals velocity in delivery. This is the same trap that made AI-Managed approaches fail for complex systems — high autonomy without adequate human guidance and validation. The faster you generate without corresponding verification capacity, the slower your actual progress becomes.
From Big Ball of Mud to Big Ball of Slop Link to heading
The software engineering community has long warned against the “big ball of mud” — systems that grow organically without architectural discipline, becoming tangled masses of interdependent code that nobody fully understands. AI-native development has created a new variant: the big ball of slop.
Teams accumulate massive amounts of specifications, design documents, and generated code. The information is scattered across files, inconsistent in quality, and incoherent in structure. Nobody has a complete mental model of what exists or how it fits together. The system has grown beyond what any individual—or even the team collectively—can hold in their heads.
Here’s the cruel irony: this information often doesn’t even fit into the effective context window of the LLMs that generated it. As I discussed in my post on effective context windows, generation accuracy decreases significantly beyond 128k tokens despite larger context windows. Your big ball of spec exceeds even the AI’s ability to reason about it coherently.
The psychological barrier makes this worse. Engineers are afraid to delete AI-generated content. “The AI must have had a reason to generate this. What if we need it later?” This fear of deletion leads to accumulation without curation. Every iteration adds more artefacts. Nothing gets removed. System entropy increases.
The result contradicts our expectations: better code quality at the component level, worse quality at the system level. Individual functions might be well-written, but the overall architecture becomes incomprehensible. The system works, sort of, but nobody can confidently modify it without breaking something else. This mirrors the challenges I described in why codebases aren’t ready for AI — poor structure and distributed logic create confusion for both humans and AI.
The faster you generate without verification, the faster you build your big ball of slop. And once you have it, untangling it becomes nearly impossible.
Using AI to Verify AI: The Meta-Validation Approach Link to heading
If AI created the verification bottleneck, can AI help solve it? Yes, but not in the obvious way. The solution isn’t using the same AI agent to check its own work—that’s circular and ineffective. Instead, use a separate LLM instance with intentionally different context with minimal bias to verify artefacts before moving forward.
The key is injecting organisational best practices and lifecycle-stage-specific guidelines into the verification LLM. During the design stage, load your organisation’s design principles, architectural standards, and past lessons learnt. During implementation, inject coding standards, security requirements, and integration patterns. The verification LLM evaluates generated artefacts against these organisational constraints that the generation LLM might not have considered.
This creates incremental confidence at each stage. Rather than generating an entire system and then trying to verify everything at once, you validate intermediate artefacts before they become inputs to the next stage. A design document gets verified against architectural principles before code generation begins. Generated code gets verified against security standards before integration. This prevents small mistakes or context gaps from compounding into large, inaccurate implementations that are difficult to review and expensive to correct.
The pattern works for multiple artefact types. Design document verification checks for completeness, consistency with organisational patterns, and alignment with business requirements. Code review automation validates adherence to coding standards, security best practices, and architectural constraints. Specification consistency checking ensures that different specs don’t contradict each other or create integration conflicts.
This is fundamentally what the Plan-Execute-Validate cycle in AI-DLC methodology implements — validation gates at each step that prevent errors from propagating forward. The mob ceremony practices take this further by having the entire team participate in real-time validation during generation, but even without full mob programming, the meta-validation approach provides crucial quality gates.
The economics matter here. Yes, you’re spending tokens on verification, but you’re spending them strategically. Catching a design flaw before code generation is far cheaper than discovering it after thousands of lines of code exist. The token cost of verification is a fraction of the token cost of regenerating an entire implementation.
The Inverted Test Pyramid: More E2E, Fewer Unit Tests Link to heading
The traditional test pyramid — many unit tests at the base, fewer integration tests in the middle, and minimal end-to-end tests at the top. This made sense when E2E tests were expensive to create and required high engineering expertise. AI has fundamentally changed this economic equation.
Creating end-to-end tests is now dramatically cheaper than it used to be. Give an LLM your database schemas, sample data, frontend components, and API specifications, and it can generate comprehensive E2E test scripts that validate complete user workflows. The complexity that once required senior engineers to carefully craft E2E tests can now be largely handled by AI with appropriate context.
Meanwhile, the value of unit tests has decreased in AI-native development. Code changes frequently as AI regenerates implementations based on evolving requirements. Unit tests that validate specific implementation details become obsolete quickly. You spend tokens generating unit tests, then spend more tokens regenerating them when the implementation changes, often multiple times per feature.
This suggests inverting the test pyramid: prioritize E2E tests that validate system behavior from the user’s perspective, maintain moderate integration testing for critical component interactions, and minimize unit tests to only the most stable, critical algorithms.
The inverted pyramid aligns with verification focus. E2E tests provide confidence in what actually matters—does the system deliver the intended user value? They validate the complete behavior rather than implementation details. When an E2E test passes, you have confidence that the feature works regardless of how the underlying code is structured.
This doesn’t mean abandoning unit tests entirely. Critical business logic, complex algorithms, and security-sensitive functions still benefit from focused unit testing. But the default shifts from “unit test everything” to “E2E test user journeys, unit test critical logic.”
The inverted pyramid isn’t just about test economics — it’s about optimising for confidence. In an environment where code changes rapidly and AI generates implementations, you need tests that validate outcomes, not implementations.
Optimize for Confidence, Not Speed Link to heading
The new asymmetry in AI-native development demands a fundamental shift in how we think about productivity. Generation speed is easy to measure and tempting to optimize. Lines of code per hour, specifications generated per day, features implemented per sprint—these metrics feel like progress.
But they’re measuring the wrong thing. The actual constraint isn’t how fast you can generate artefacts. It’s how confidently you can validate them. The long term technical success of a software system is measured by its adaptability to future changes, not how quickly technical debt can be added.
Teams that optimise for verification velocity will outperform teams that optimise for generation speed. This means investing in meta-validation approaches that catch errors early. It means inverting the test pyramid to focus on outcome validation rather than implementation testing. It means building processes that prevent the big ball of spec from forming in the first place.
The strategic advantage goes to organisations that recognise this shift. While competitors drown in AI-generated artefacts they can’t manage, teams with verification-first approaches maintain clarity, confidence, and control. They generate less but validate more. They move deliberately rather than frantically.
Methodologies like AI-DLC provide frameworks for this verification-first approach through structured validation gates, incremental complexity decomposition, and composable specifications that prevent information overload. But the core principle applies regardless of methodology: optimise for confidence, not speed.
The conversation with that engineer after my chalk talk ended with a question: “How do we get out of this mess?” The answer isn’t generating more to fix what we’ve generated. It’s stepping back, establishing verification practices, and rebuilding confidence in what we create. Sometimes that means deleting the big ball of slop and starting fresh with better practices — much like rebuilding validated prototypes with proper foundations rather than refactoring them into production.
In the AI era, velocity comes from confidence, not generation speed. The teams that internalise this will build better systems faster. The teams that don’t will keep accumulating artefacts they can’t manage, wondering why their productivity gains never materialised.