Your code base isn't ready for AI

And how to fix it

Generative AI is a class of technology that is drastically different from the innovations we have seen in the field of software engineering in the past two decades. We have gone through the Agile movement and then the DevOps evolution, but these are incremental innovations, not tectonic shifts.

To effectively leverage GenAI in large engineering organizations, it is important to understand the pre-requisite preparations to achieve hyper-productivity. We will start with the artefact that is “closest” to GenAI - your codebase.

I have visited numerous large enterprises to transform the way they deliver software systems. And here are the most common areas of improvements that has a significant impact in every LLM invocation.

1. Poor Project Structure Link to heading

Most project code repositories are not organised in a true full-stack fashion. Front-end code is often separated from the back-end, and it is very rare that Infrastructure as Code (IaC) is included in the same application repository, if it exists in the first place. The directory structure is often inconsistent, or does not have the proper breadth/depth file ratio to appropriately segment modules and services.

Impact - Agentic coding assistants struggle to determine the correct source code boundary to be included in large, complex software modification tasks. The **lack of visibility****to the full stack implementation** requires software engineers to furnish the missing information.

Solution - *Move towards full stack repositories - front end, back end, IaC, documentation. Use file directory to draw boundaries around source code with high affinity.*****

2. Distributed Business Logic Link to heading

Legacy code bases have significant fog-of-war due to the changes of personnel who have since left the organization. The style and organization of business logic may be following opinionated preferences of individual software engineers. Legacy business functions may not be updated and refactored to reduce the risk of change. Business logic that are highly related may be implemented in different places.

Impact - This creates confusion to human software engineers and severely hinders the onboarding experience even before GenAI. The same issue extends to when GenAI is applied - human practitioners may not be aware of other related source code files and is not included in the task reference during prompting.

Solution - Leverage on the established project structure above, refactor to ensure high cohesion and loose coupling of source code artefacts.

for free to receive new posts and support my work.

3. Overuse of Acronyms Link to heading

One of the oldest (and yet perhaps trivial) problem in programming is how to name your variables. Many software teams prefer acronyms to reduce the length of variable and function names in source code. Much of this preference is an extension of the organization culture.

Impact - While this does make sense to reduce the number of input tokens for LLM invocation, it obfuscates the business intentions and significance. Organization or project specific acronyms may not be maintained in the LLM response.

Solution - Either expand on the acronyms or explain in doc string and in-line comments.

4. Lack of Appropriate Comments Link to heading

Writing source code comments is an art - it should explain the why, while the how should be self-explanatory in the code. But in practice, we see the reverse in production code bases. Either comments are verbosely describing the mechanisms of a piece of code, or they don’t exist at all.

Impact - LLMs do not need to know how sequential lines of code work together. Rather, it’s more important to illustrate the relational nature of independent units of code work together and side effects. Poorly written in-line comments can unnecessarily take up input token.

Solution - Clean up comments on how, explain complex, relevant business rationales and integrations with in-line comments.

5. Degree of Separation with Documentation Link to heading

Documentation, both business and technical, are typically not checked into the code base. There was a good reason for this. These documentation files typically contain media assets and are saved as file formats that are not IDE friendly. But this degree of separation, when the documentation is kept in another repository, there is a higher chance of dissonance of information - outdated documentation contradicts with the implementation.

Impact - Documentation can more effectively represent parts of the software system that are relevant to the task at hand but are not in the scope of modification. Natural language descriptions utilize approximately half the amount of input token compared to source code. Image mock-ups and process flows are also helpful for sequential workflow implementations with a multi-modal LLM.

Solution - Check in documentation together with the code repository. Leverage markdown (.md) files, structured language for diagrams (Mermaid, PlantUML)

These are not new issues that are specific to LLMs and GenAI technology. We have been fighting these battles to help software engineers build faster and better for years. Now there may be a new super intern with thousands of years of experience for every human software engineer, but with short term memory. These code quality problems are becoming major hindrances to how effectively we can leverage GenAI.

But these are not isolated solutions. All these issues are the result of the way we collaborate to create software in the past - the human structure and team processes. To be an AI-First engineering organization, you will also need to examine how to reinvent on those fronts.

About the Author - Derick Chen

I'm a Developer Specialist Solutions Architect at AWS Singapore, where I lead the AI-Driven Development Lifecycle (AI-DLC) programme across multiple key countries in ASEAN and the wider APJ region. As an early contributor to the AI-DLC methodology and its foundational white paper, I help engineering organizations build complex software faster and better, unlocking 10X delivery velocity through reimagined processes and team structures.

Previously, I worked at Meta on platform engineering solutions and at DBS Bank on full-stack development for business transformation initiatives. I graduated Magna Cum Laude from New York University with a BA in Computer Science.

Follow me on LinkedIn for more insights on AI-driven development and software engineering.

The views expressed in this article are my own and do not represent the views of my employer.