A complete workflow for AI-assisted coding (without losing control of your codebase)

26 May 2026 06:37 509,537 views

Learn a practical, end-to-end workflow for building features with AI coding tools. This guide covers smart context management, the “Grill Me” planning skill, PRDs, Kanban-style issue breakdowns, AFK agents, and how to keep your architecture healthy.

AI coding tools can feel magical one minute and painfully dumb the next. The difference usually isn’t the model – it’s your workflow. This guide walks through a complete, practical way to build real features with AI, while keeping control of your codebase and avoiding the “vibe coding” trap.

Understand How LLMs Actually Behave

Before you can build a reliable workflow, you need a mental model for how large language models (LLMs) behave during coding sessions.

The Smart Zone vs. Dumb Zone

LLMs work best in what you can think of as a “smart zone” – when the context is relatively small and focused. As you keep adding messages, files, and tokens to the same conversation, attention relationships grow quadratically. Eventually you drift into the “dumb zone,” where the model starts making sloppy, inconsistent decisions.

In practice, this means:

Even with 1M-token context windows, the “sharp” range is often closer to ~100K tokens.
Long, never-ending chats tend to degrade in quality over time.
Big, monolithic tasks push the model straight into the dumb zone.

So your workflow should aim to keep tasks small and focused, and reset context regularly instead of trying to stuff everything into one mega-session.

LLMs Have Memento-Style Memory

LLMs don’t have persistent memory between sessions. Each conversation is like a fresh start from the system prompt. You can try to preserve history via summaries ("compacting"), but that introduces its own problems:

Summaries can miss important details or encode subtle misunderstandings.
Compacting repeatedly adds “sediment” – layers of lossy summaries on top of each other.

Instead of relying heavily on compacting, it’s often better to embrace statelessness: design your workflow so each session can start from a small, well-defined prompt and a few key artifacts (like a PRD or issue file).

Start Every Feature with a “Grill Me” Session

Most AI coding failures aren’t about syntax – they’re about misalignment. The model and the human never truly agree on what’s being built. To fix that, you want to reach a shared design concept before any serious implementation starts.

Why Specs-to-Code Usually Fails

A popular idea is “specs to code”: write a specification document, feed it to an AI, and let it generate the implementation. If something’s wrong, you edit the spec and regenerate. In practice, this often turns into “vibe coding with extra steps”:

The code drifts away from reality because nobody is really reading it.
Subtle edge cases and domain details get lost between spec and implementation.
You lose the ability to shape the codebase intentionally.

Instead, you want a conversational, interrogative phase where the AI helps you clarify the problem and decisions, not just spit out a plan.

How the “Grill Me” Skill Works

The “Grill Me” skill is a tiny but powerful prompt pattern you can reuse in any tool (Claude, ChatGPT, Gemini, etc.). The idea:

You paste in your initial brief (for example, a Slack message from a PM asking for “gamification” features).
You invoke a skill that tells the AI to interview you relentlessly about every aspect of the idea.
For each question, the AI also proposes a recommended answer, which you can accept, tweak, or reject.

A typical Grill Me prompt looks like:

“Interview me relentlessly about every aspect of this plan until we reach a shared understanding.”
“Walk down each branch of the decision tree, resolving dependencies one by one.”
“For each question, provide your recommended answer.”

The AI will ask about things you and your PM probably didn’t think through:

Should points be retroactive? Do we backfill old lesson completions?
What actions earn points, and how many?
Where should the gamification UI live in the app?
How do streaks interact with points?

This can easily become a 40–100-question conversation. That’s a feature, not a bug: the output is a rich, shared mental model between you and the AI.

Use Sub-Agents for Exploration

Many modern tools support “sub-agents” or delegated calls. In this workflow, a sub-agent can:

Scan the repo in its own isolated context window.
Summarize relevant modules, services, and routes.
Return a concise report to the main agent that’s grilling you.

This keeps your main session lean while still giving the AI enough context to ask smart, repo-aware questions.

Capture the Destination with a Lightweight PRD

Once you’ve grilled the idea thoroughly, you don’t want to keep all that context in a chat. You want a durable artifact that describes the destination: a Product Requirements Document (PRD).

What Goes Into the PRD

The PRD doesn’t need to be perfect or formal. It just needs to accurately summarize the shared understanding you reached during the Grill Me session. A simple structure works well:

Problem statement – What user problem are we solving?
Solution overview – High-level description of the feature.
User stories – “As a student, when I complete a lesson, I earn X points…”
Implementation decisions – Key choices like point thresholds, retroactivity rules, and where data lives.
Testing decisions – What needs to be covered by automated tests.
Out of scope – Explicitly list what this PRD will not cover.

You can have the AI generate this PRD directly from the grilling conversation using a “Write a PRD” skill. Because LLMs are strong summarizers, you often don’t need to manually edit the PRD unless something feels obviously off.

Don’t Over-Invest in Perfect Specs

It’s tempting to keep iterating on the PRD until it looks flawless. That’s usually wasted effort. The PRD is a directional artifact, not a contract. The real quality control will come later from tests, feedback loops, and QA.

Once the PRD is good enough to describe the destination, move on to planning the journey.

Turn the PRD into a Kanban Board of Vertical Slices

Now you need to break the PRD into work units that:

Fit inside the model’s smart zone.
Can be picked up by agents independently.
Deliver end-to-end value so you can get fast feedback.

Instead of a linear “Phase 1, Phase 2, Phase 3” plan, you want a Kanban-style set of issues with clear dependencies.

Why Vertical Slices Beat Horizontal Phases

LLMs naturally tend to plan horizontally: first all database changes, then all API work, then all UI. That’s a problem because you don’t get real feedback until the end. You want vertical slices (also called “tracer bullets”): thin cuts that go all the way through the stack.

For example, instead of:

Phase 1: Create gamification database tables.
Phase 2: Implement gamification service.
Phase 3: Build dashboard UI.

You’d prefer something like:

Issue 1: Award points for lesson completion and show total points on the dashboard.
Issue 2: Add streak tracking and display streaks on the dashboard.
Issue 3: Retroactively backfill points for existing lesson completions.

Each issue crosses all relevant layers (schema, service, UI) and can be tested end-to-end.

Representing Dependencies as a DAG

Once the AI proposes issues from the PRD, you review and adjust them into a directed acyclic graph (DAG) of dependencies:

Some tasks are unblocked and can be done first (e.g., create base gamification service and schema).
Others depend on those (e.g., wiring points into lesson completion).
Some depend on multiple earlier tasks (e.g., a summary dashboard that shows points and streaks).

This DAG is your Kanban board. It enables parallelization: once the foundational issue is done, multiple independent issues can be worked on at the same time by different agents or loops.

Delegate Implementation to AFK Agents

Up to this point, humans are firmly in the loop: clarifying requirements, shaping architecture, and reviewing the issue breakdown. Now you can hand off implementation to an “AFK” (away-from-keyboard) agent loop.

The Ralph-Style Implementer Loop

The core idea is a simple loop that repeatedly:

Loads the backlog of issues (from local markdown files or GitHub issues).
Chooses the next AFK-eligible issue based on priority and dependencies.
Explores the repo to understand the current state.
Implements the issue using test-driven development (TDD).
Runs feedback loops (tests, type checks, linters).
Commits the changes with a clear summary.

You can run this once in an interactive shell to tune the prompt, then wrap it in a script or orchestrator to run repeatedly.

For a more structured, TypeScript-based approach to these loops, you can look at tools like Sandcastle, which let you:

Create isolated work trees (e.g., Git branches) in Docker sandboxes.
Run implementer agents per issue.
Run reviewer agents on the resulting commits.
Merge branches back into main with automated conflict resolution.

Always Use TDD with Agents

Test-driven development (TDD) is especially powerful with AI agents. The pattern is:

Red: Write a failing test that expresses the desired behavior.
Green: Implement the minimal code to make the test pass.
Refactor: Clean up the implementation while keeping tests green.

When you instruct the AI to follow red–green–refactor, you get:

Better tests, because the test is written before the implementation.
Less cheating, because the model can’t just mirror implementation details into the test.
More robust feedback loops: tests become a reliable safety net.

In many codebases, the quality of your tests and feedback loops is the ceiling on how good your AI-generated code can be. If tests are weak or missing, the agent is effectively coding blind.

Separate Implementers and Reviewers

One subtle but important trick: don’t ask the same context to both implement and review. If you implement a feature in a long conversation and then ask the same agent to review it, the review happens in the dumb zone.

A better pattern:

Let the implementer finish and commit.
Clear context and start a fresh reviewer agent in a new session.
Give the reviewer the diff, relevant files, and your coding standards.

Because the reviewer starts in the smart zone with a focused context, it can catch more subtle issues.

This is also where you “push” your coding standards: pass style guides, architecture rules, and security constraints directly to the reviewer so it can enforce them against the new code.

Keep Humans in the Loop for QA and Taste

Even with strong tests and automated reviews, you still need human QA and code review. Especially for user-facing features, AI can’t yet judge taste, UX quality, or product fit.

After an AFK loop completes an issue, a human should:

Run the app and manually exercise the new flow.
Check that the UI feels coherent and not “AI-generated weird.”
Review tests first (are they meaningful?), then the implementation.
File follow-up issues for bugs, polish, or refactors.

Those follow-up issues go back onto the Kanban board with appropriate dependencies, and the AFK loop can pick them up later. QA isn’t the end of the workflow – it’s a generator of new, better-defined work.

Design Your Codebase for AI: Deep Modules, Not Spaghetti

AI agents perform much better in codebases that are designed with clear, deep modules and strong boundaries. If your repo is a tangle of tiny, shallow modules, both humans and AI will struggle.

Shallow vs. Deep Modules

Borrowing from John Ousterhout’s “A Philosophy of Software Design”:

Shallow modules expose a lot of surface area (many functions, many small files) but don’t hide much complexity. The caller has to understand a lot of internal details.
Deep modules have a small, simple interface but encapsulate a lot of behavior and complexity inside.

In a shallow codebase, AI tends to:

Write tests around tiny functions in isolation.
Miss important interactions between modules.
Get lost in dependency graphs when trying to reason about behavior.

In a deep-module codebase, you can:

Wrap tests around a single, meaningful service (e.g., GamificationService) and cover a lot of behavior at once.
Give the AI a clear target: “implement this method on this service and update its tests.”
Think of modules as gray boxes: you know their contract and behavior, but you don’t need to micromanage internal details.

Use an Architecture-Improvement Skill

If your current codebase feels hostile to AI, you can use an “Improve Codebase Architecture” skill to:

Scan for clusters of related modules that could be combined into deeper services.
Identify high-value areas with zero or weak tests (e.g., a quiz scoring service).
Propose refactorings that create clearer boundaries and better test seams.

You can then treat these architectural improvements as their own PRDs and Kanban issues, run them through the same workflow, and gradually transform your codebase into something AI (and humans) can work in effectively.

What to Keep, What to Throw Away

One subtle but important question is what artifacts you persist:

PRDs and planning docs: if you keep them in the repo, they can rot and mislead future agents. A closed GitHub issue is often a better home than a permanent markdown file in the codebase.
Migrations: these are deterministic history and usually worth keeping.
Grill Me transcripts: usually ephemeral. Their value is in the PRD and issues they produce.

The general rule: if an artifact will stay in sync with reality (like migrations or tests), keep it. If it’s likely to drift (like early specs), consider archiving it outside the main code tree or marking it clearly as historical.

Putting It All Together

This workflow might feel heavier than just “ask the AI to build the feature,” but it scales much better to real products and teams. In summary:

Respect the smart zone and reset context often.
Use a Grill Me phase to reach genuine alignment with the AI.
Capture the destination in a lightweight PRD.
Break work into vertical slices on a Kanban-style board with dependencies.
Delegate implementation to AFK agents using TDD and strong feedback loops.
Separate implementer and reviewer agents, and push coding standards into the review step.
Keep humans in the loop for QA, taste, and high-level architecture.
Continuously improve your architecture toward deep, testable modules.

If you’re just getting started with AI coding tools, you may also find it helpful to walk through a full beginner-friendly setup in a dedicated guide like this Claude tutorial from first prompt to coding. And if you want a more tool-specific view of running this style of workflow inside a desktop environment, check out this full Claude Code workflow guide.