External Reasoning: Using Coding Agents with an LLM

A follow-up to “Stochastic Illusion: LLM Reasoning” exploring how external structures can augment LLM capabilities in software development

Introduction: Beyond the Stochastic Illusion

In “Stochastic Illusion: LLM Reasoning”, I explored how Large Language Models create an impressive but ultimately limited form of reasoning. LLMs generate plausible sequences through “limited capacity stochastic construction” rather than true causal reasoning, creating what I call an “Agentic Stream of Consciousness” that appears to understand but fundamentally operates through pattern matching. Very sophisticated pattern matching!

This limitation becomes particularly problematic when developers deploy LLMs as coding agents. Software development requires sustained logical consistency, complex architectural decisions, and systematic validation—exactly the kind of multi-step causal reasoning where LLMs typically “give up” or produce inconsistent results.

But what if developers could augment LLM capabilities with external reasoning structures? What if instead of asking the LLM to maintain complex logic internally, developers provided external frameworks that guide, constrain, and validate its output?

Clearly it adds value. Claude Code and OpenHands with Mistral are showing what is possible. I have been using Claude Code on a new project, applying to an area I have not developed in before - both Language (Typescript) and Environment (VS Code extension).

This post explores how to add context and tools, effectively external reasoning, that transforms LLMs from impressive but unreliable tools into systematic, reliable development partners.

The Challenge: Stochastic Illusion in Code

When LLMs act as coding agents, the stochastic illusion manifests in predictable ways:

Architecture Drift

LLMs struggle to maintain consistent architectural decisions across a large codebase. They might implement a feature using one pattern, then switch to a completely different approach for similar functionality, creating inconsistent and unmaintainable code.

Requirement Degradation

As conversations grow longer, LLMs gradually lose track of original requirements. They might implement a feature that works but violates earlier constraints, or abandon edge cases that were specified early in the conversation.

Test Coverage Gaps

LLMs are notoriously poor at systematic testing. They might write tests for the happy path but miss error conditions, or write comprehensive tests for one module while ignoring others entirely.

The “Good Enough” Trap

Perhaps most dangerously, LLMs tend to produce code that works immediately but fails under more complex conditions. They optimize for quick success rather than long-term maintainability.

Complexity Abandonment

When faced with complex refactoring or systematic improvements, LLMs often propose shortcuts or incomplete solutions rather than working through the full complexity.

These failures aren’t random, they follow from the fundamental limitations of stochastic generation. The LLM can maintain local coherence but struggles with global consistency, systematic validation, and sustained logical reasoning.

External Reasoning Structures: The Solution

The key insight is that I don’t need to solve the stochastic illusion problem, I need to work with it. Instead of asking LLMs to maintain complex reasoning internally, I can provide external structures that guide their output toward systematic, consistent results.

This isn’t just theoretical, here’s a concrete example: the development of a VS Code extension for micro.blog integration, built using LLMs but guided by external reasoning structures, based on development best practices.

Domain-Driven Design as External Logic

Domain-Driven Design (DDD) provides a crucial external reasoning framework. Instead of letting the LLM make ad-hoc architectural decisions, I established clear domain boundaries:

src/
├── domain/                   # Pure business logic (no dependencies)
│   ├── Blog.ts              # Blog entity with domain validation
│   ├── Post.ts              # Post entity with content parsing
│   ├── LocalPost.ts         # Local post entity with frontmatter
│   └── Credentials.ts       # Authentication value object
├── services/                # Application services
│   ├── MicroblogService.ts  # Main orchestration
│   ├── ApiClient.ts         # HTTP client
│   └── PublishingService.ts # Publishing workflow
└── providers/               # VS Code integration
    ├── TreeProvider.ts      # Content tree view
    └── ContentProvider.ts   # Read-only content viewer

This structure acts as external reasoning that prevents architectural drift. Voilà—no more spaghetti code! When the LLM needs to add new functionality, the DDD boundaries force it to consider:

  • Is this domain logic or application logic?
  • Should this be a new entity or extend an existing one?
  • What are the dependencies and how do they flow?

The external structure provides the logical framework the LLM follows, compensating for its inability to maintain architectural consistency internally.

Test-Driven Development as External Validation

Acceptance Test-Driven Development (ATDD) provides external validation of LLM reasoning. Instead of trusting the LLM to write correct code, I established a systematic testing workflow:

// External validation: tests written before implementation
suite('Content Creation Phase', () => {
  test('user can create new local post', async () => {
    const title = 'My Test Post';
    await executeCommand('microblog.newPost', title);
    
    // External validation: verify the complete workflow
    const localDrafts = await getTreeViewItems('📝 Local Drafts');
    assert.ok(localDrafts.includes(title));
    
    const filePath = path.join(workspacePath, 'content', `${title.toLowerCase().replace(/\s+/g, '-')}.md`);
    assert.ok(await fileExists(filePath));
  });
});

This external validation catches LLM reasoning failures that would otherwise go unnoticed:

  • Did the LLM implement the complete workflow?
  • Does the feature work under edge conditions?
  • Are all the integration points correctly connected? (Trust me, integration bugs are where good intentions go to die).

This systematic validation approach echoes the concept of TDD as a reward function, where external metrics guide development decisions and provide quantifiable feedback on code quality beyond simple test passage.

Documentation as External Memory

LLMs have no persistent memory across conversations. To compensate, I use CLAUDE.md as external memory that preserves architectural decisions, development protocols, and lessons learned:

## Architecture (DDD within VS Code Structure)
- Domain entities have no external dependencies
- Services orchestrate between domain and infrastructure
- Providers handle VS Code integration only

## Development Commands
- `npm run compile` - Build TypeScript
- `npm test` - Run all tests (114 passing)
- `npm run lint` - ESLint validation

## Quality Gates (Non-Negotiable)
1. Run `npm run compile` - ensure TypeScript compiles cleanly
2. Run `npm test` - ensure all tests pass
3. Run `npm run lint` - ensure code style compliance

This external memory prevents the LLM from making decisions that contradict earlier architectural choices or forgetting established development practices. This documentation-first approach builds on intention-based development principles, where clear documentation and systematic processes guide AI-assisted development toward more reliable outcomes.

Incremental Development as External Control

Complex software development requires breaking large problems into manageable pieces. Claude Code’s TodoWrite tool provides external task management that prevents the LLM from attempting to solve everything at once:

// External task management prevents complexity abandonment
[
  {"content": "Create LocalPost domain entity", "status": "completed"},
  {"content": "Implement file operations in FileManager", "status": "in_progress"},
  {"content": "Add publishing workflow", "status": "pending"},
  {"content": "Integrate with VS Code tree view", "status": "pending"}
]

Claude Code’s TodoWrite tool acts as an external control structure that forces the LLM to:

  • Work systematically through complex problems
  • Complete each step before moving to the next
  • Maintain progress visibility
  • Prevent the “complexity abandonment” behavior on complex tasks

The tool is specifically designed to address the LLM’s tendency to give up on complex implementations. By externalizing task management, Claude Code transforms the stochastic generation process into a systematic workflow where each step must be completed before proceeding to the next.

Regression Test Protection as External Validation

Perhaps most importantly, I implemented regression test protection that prevents the LLM from accidentally breaking existing functionality:

## REGRESSION TEST PROTECTION (CRITICAL)
The regression test suite is PROTECTED and should NEVER be modified during feature development.
- READ-ONLY during feature development
- Only modify when explicitly requested
- Validates completed functionality

This external validation system ensures that as the LLM adds new features, it doesn’t inadvertently break existing functionality—a common failure mode when stochastic generation loses track of global constraints.

Real-World Implementation: The Results

The external reasoning approach produced measurable results in the VS Code extension project:

Quantitative Success

  • 114 passing tests with comprehensive coverage
  • Clean architecture maintained across 3 development phases
  • Zero regression failures during feature development
  • Successful feature delivery of all planned functionality

Qualitative Improvements

  • Consistent code patterns across the entire codebase
  • Proper separation of concerns between domain, services, and providers
  • Systematic error handling throughout all modules
  • Comprehensive documentation maintained throughout development

LLM Behavior Changes

Most importantly, the external reasoning structures changed how the LLM behaved:

  • Systematic rather than ad-hoc development approach
  • Consistent architectural decisions across features
  • Complete feature implementation rather than partial solutions
  • Proactive quality validation rather than reactive bug fixing

It’s like the difference between having a good editor and frantically proofreading at 2 AM. (Guess which one I prefer?)

Guidelines for External Reasoning Design

Based on this experience, here are principles for designing effective external reasoning structures:

1. Compensate for Specific LLM Limitations

  • Memory: Provide persistent documentation and context
  • Consistency: Establish clear architectural boundaries
  • Validation: Implement systematic testing workflows
  • Complexity: Break large problems into manageable pieces

2. Make Constraints Explicit

  • Quality gates that must be passed before proceeding
  • Architectural boundaries that cannot be violated
  • Testing requirements that must be met
  • Development protocols that must be followed

3. Provide Systematic Feedback

  • Automated testing that catches reasoning failures
  • Progress tracking that prevents abandonment
  • Quality metrics that measure systematic success
  • Documentation that preserves lessons learned

4. Balance Creativity with Structure

  • Domain boundaries that allow creativity within constraints
  • Test-driven development that validates creative solutions
  • Incremental development that allows exploration within limits
  • Quality gates that ensure systematic validation

5. Make External Reasoning Visible

  • Todo lists that show systematic progress
  • Test results that demonstrate validation
  • Documentation that explains architectural decisions
  • Quality metrics that measure systematic success

Conclusion: Hybrid Intelligence

The future of AI-assisted development isn’t about building better LLMs—it’s about building better external reasoning systems that augment LLM capabilities.

LLMs excel at pattern recognition, code generation, and creative problem-solving. They struggle with sustained logical reasoning, systematic validation, and architectural consistency. By providing external structures that handle these systematic aspects, we can create hybrid intelligence systems that combine the best of both approaches.

That’s no different to humans, well definitely me, projects are complex, we all benefit from these sort of external structures.

Tools like Claude Code and OpenHands are already demonstrating this hybrid approach in practice, showing how external reasoning structures can transform LLMs into reliable development partners.

The Hybrid Approach

  • LLMs provide creativity, pattern recognition, and rapid implementation
  • External reasoning provides consistency, validation, and systematic control

This hybrid approach integrates naturally with other systematic development practices—TDD as a reward function provides the quantitative validation framework, while intention-based development offers the structured workflow that makes external reasoning practical in real-world development scenarios.

Practical Next Steps

  1. Identify LLM failure modes in your development process
  2. Design external structures that compensate for these limitations
  3. Implement systematic validation that catches reasoning failures
  4. Create feedback loops that improve the external reasoning over time
  5. Document the approach so it can be refined and shared

The stochastic illusion problem isn’t going away—but it doesn’t have to limit what developers can achieve with LLMs. By designing thoughtful external reasoning structures, developers can transform unreliable coding assistants into systematic, reliable development partners.

The key is recognizing that the problem isn’t the LLM’s reasoning—it’s our expectation that the LLM should handle all reasoning internally. When developers provide external structures that guide, constrain, and validate LLM output, they create something more powerful than either could achieve alone: true hybrid intelligence.


This post is part of a series exploring AI-assisted development practices:

This post is based on real experience developing a VS Code extension using LLMs guided by external reasoning structures. The complete project, including all external reasoning tools and documentation, is available as a case study in systematic AI-assisted development.