Balinder Walia

·March 23, 2026·

Building AI Agents from Scratch: Architectures, Frameworks, and Best Practices

Architectures, Frameworks, and Best Practices for AI Agent Development

Agent Design Patterns

Before writing a single line of code, understanding the fundamental design patterns for AI agents is essential. These patterns define how an agent reasons, plans, and executes tasks, and choosing the right pattern determines the capabilities and limitations of your agent system.

The ReAct Pattern (Reasoning + Acting)

ReAct is the most widely adopted pattern for LLM-based agents. It interleaves reasoning (thinking) with acting (tool use) in a continuous loop. At each step, the agent:

Observes the current state (user input, previous tool results)
Thinks about what to do next (chain-of-thought reasoning)
Acts by calling a tool or providing a response
Observes the result and repeats

The ReAct pattern is elegant because it leverages the LLM's natural ability to reason in text while grounding that reasoning in real-world observations. It excels at tasks where the next step depends on the outcome of the previous one, such as research, debugging, or data analysis.

Plan-and-Execute

The Plan-and-Execute pattern separates planning from execution into distinct phases. First, a planning agent analyses the task and produces a structured plan with ordered steps. Then, an execution agent carries out each step, reporting results back to the planner. The planner can revise the plan based on outcomes.

This pattern is superior for complex, multi-step tasks where upfront planning reduces errors and wasted effort. It also provides better transparency, as the plan can be reviewed by humans before execution begins.

Tree of Thoughts

Tree of Thoughts extends chain-of-thought reasoning by exploring multiple reasoning paths simultaneously. The agent generates several candidate approaches, evaluates each, and pursues the most promising paths while pruning dead ends. This pattern is valuable for creative problem-solving and tasks with multiple valid approaches.

Reflexion

The Reflexion pattern adds a self-evaluation step to the agent loop. After completing a task or sub-task, the agent critiques its own output, identifies weaknesses or errors, and iterates to improve the result. This is particularly effective for writing, code generation, and analysis tasks where quality matters more than speed.

Building Blocks of an AI Agent

Every AI agent system requires four foundational components that work together to enable autonomous behaviour.

LLM Backbone

The large language model serves as the reasoning engine. The choice of LLM significantly impacts agent capabilities:

Claude (Anthropic): Excels at complex reasoning, tool use, and following detailed instructions. Strong safety properties make it suitable for production agents handling sensitive operations
GPT-4 (OpenAI): Robust tool calling, extensive third-party ecosystem, and strong performance across diverse tasks
Open-source models (Llama, Mistral): Provide full control over deployment and costs, suitable for agents that need to run on-premises or at high volume

Tool Integration

Tools extend the agent beyond text generation into real-world interaction. A well-designed tool interface includes:

Clear, descriptive tool names and documentation that help the LLM understand when to use each tool
Structured input schemas with validation to prevent malformed calls
Error handling that returns informative messages the LLM can act on
Timeouts and rate limits to prevent runaway operations

Memory System

For agents that operate across sessions or handle long tasks, a memory system is essential. Implement memory using:

Conversation buffer: Keep the full conversation history in the LLM context (simple but limited by context window)
Summary memory: Periodically summarise older conversation turns to compress context
Vector store: Embed and store important information for semantic retrieval when relevant
Persistent files: Write key decisions and facts to files that persist across sessions

Orchestration Layer

The orchestration layer manages the agent loop, tool execution, error recovery, and interaction with external systems. It handles:

Message formatting and context management for the LLM
Tool call parsing, execution, and result injection
Retry logic and error recovery
Logging, monitoring, and observability
Human-in-the-loop approvals for sensitive actions

Step-by-Step: Building a Basic Agent in Python

Let us build a simple ReAct agent from scratch using Python and the Anthropic API. This example demonstrates the core concepts without framework dependencies.

Step 1: Define Your Tools

import anthropic
import json

# Define tools the agent can use
tools = [
    {
        "name": "search_web",
        "description": "Search the web for current information on a topic",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query"
                }
            },
            "required": ["query"]
        }
    },
    {
        "name": "calculate",
        "description": "Perform a mathematical calculation",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Mathematical expression to evaluate"
                }
            },
            "required": ["expression"]
        }
    }
]

Step 2: Implement Tool Execution

def execute_tool(tool_name, tool_input):
    """Execute a tool and return the result."""
    if tool_name == "search_web":
        # Replace with actual search API integration
        return search_api(tool_input["query"])
    elif tool_name == "calculate":
        try:
            result = eval(tool_input["expression"])  # Use safe eval in production
            return str(result)
        except Exception as e:
            return f"Error: {str(e)}"
    return "Unknown tool"

Step 3: Build the Agent Loop

def run_agent(user_message, max_iterations=10):
    client = anthropic.Anthropic()
    messages = [{"role": "user", "content": user_message}]
    
    system_prompt = """You are a helpful AI agent. Use the available tools 
    to research and answer questions accurately. Think step by step."""
    
    for i in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            system=system_prompt,
            tools=tools,
            messages=messages
        )
        
        # Check if the agent wants to use a tool
        if response.stop_reason == "tool_use":
            # Extract tool calls from response
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = execute_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })
            
            # Add assistant response and tool results to messages
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        else:
            # Agent is done, return final response
            return response.content[0].text
    
    return "Agent reached maximum iterations"

Multi-Agent Orchestration Patterns

For complex workflows, single agents often fall short. Multi-agent systems distribute work across specialised agents that collaborate to solve problems.

Sequential Pipeline

Agents process tasks in a linear sequence, each passing its output to the next. For example: Research Agent gathers information, Analysis Agent processes it, Writing Agent produces the final report. This pattern is simple and predictable but lacks parallelism.

Supervisor Pattern

A supervisor agent manages a team of worker agents, delegating tasks and aggregating results. The supervisor decides which worker to engage based on the current task requirements. This pattern provides strong control and is well-suited for workflows with clear role boundaries.

Collaborative Discussion

Multiple agents engage in a structured discussion, each contributing their specialised perspective. A moderator agent manages the conversation flow and synthesises the final output. This pattern excels at tasks requiring diverse expertise, such as code review or strategic planning.

Hierarchical Delegation

Agents are organised in a hierarchy where high-level agents break down tasks and delegate to lower-level specialists. Each level can further decompose and delegate, creating a tree of coordinated work. This pattern scales well for large, complex projects.

Evaluation and Testing AI Agents

Testing AI agents is fundamentally different from testing traditional software due to the non-deterministic nature of LLM outputs.

Unit Testing Tools

Test each tool in isolation with known inputs and expected outputs. Tool tests should be deterministic and cover edge cases including error conditions, malformed inputs, and timeout scenarios.

Trajectory Testing

Record and evaluate the sequence of actions an agent takes to complete a task. Good trajectory tests verify that the agent follows a reasonable path, not just that it reaches the correct answer. Compare trajectories against expert-defined golden paths.

End-to-End Evaluation

Run the full agent against a benchmark suite of tasks with known correct outcomes. Measure success rate, average number of steps, tool call efficiency, and cost per task. Track these metrics over time to detect regressions.

Adversarial Testing

Test agents with adversarial inputs including prompt injection attempts, ambiguous instructions, impossible tasks, and conflicting requirements. Verify that the agent handles these gracefully without producing harmful outputs or entering infinite loops.

Production Deployment Considerations

Moving AI agents from prototype to production requires addressing several critical concerns.

Cost Management

LLM API calls are the primary cost driver for AI agents. Control costs by:

Setting maximum iteration limits per task
Using smaller models for routine reasoning and larger models for complex decisions
Implementing caching for repeated tool calls and LLM queries
Monitoring per-task costs and setting budget alerts

Reliability and Error Recovery

Production agents must handle failures gracefully:

Implement retry logic with exponential backoff for transient API failures
Design fallback strategies when primary tools are unavailable
Add circuit breakers to prevent cascading failures
Log all agent actions for debugging and audit trails

Safety and Guardrails

Deploy safety mechanisms to prevent agents from taking harmful actions:

Require human approval for high-impact operations (financial transactions, data deletion, external communications)
Implement allowlists for permitted tool actions and blocklists for forbidden operations
Add output filtering to catch sensitive information before it leaves the system
Set up monitoring alerts for anomalous agent behaviour

Observability

Comprehensive observability is essential for understanding and improving agent performance:

Trace every agent execution including all reasoning steps, tool calls, and decisions
Track latency, token usage, and cost per interaction
Monitor success rates and failure modes across different task types
Build dashboards that give operations teams real-time visibility into agent health

How Workstation Builds Custom AI Agent Solutions

At Workstation, we have deep expertise in designing and deploying production AI agent systems for businesses across industries. Our approach includes:

Discovery and design: We work with your team to identify high-value automation opportunities and design agent architectures that integrate with your existing infrastructure
Rapid prototyping: We build working agent prototypes quickly, allowing you to evaluate capabilities and provide feedback before committing to full development
Production engineering: Our team handles the complexities of production deployment including reliability, security, cost optimisation, and monitoring
Custom tool development: We build bespoke tools that connect your AI agents to internal systems, databases, APIs, and third-party services
Ongoing optimisation: We continuously monitor and improve your agent systems, reducing costs, increasing accuracy, and expanding capabilities over time

Ready to build AI agents that transform your business operations? Contact us at info@workstation.co.uk to discuss your requirements and see how our team can bring your AI agent vision to life.