Building AI Agents from Scratch: Architectures, Frameworks, and Best Practices
Architectures, Frameworks, and Best Practices for AI Agent Development

Agent Design Patterns
Before writing a single line of code, understanding the fundamental design patterns for AI agents is essential. These patterns define how an agent reasons, plans, and executes tasks, and choosing the right pattern determines the capabilities and limitations of your agent system.
The ReAct Pattern (Reasoning + Acting)
ReAct is the most widely adopted pattern for LLM-based agents. It interleaves reasoning (thinking) with acting (tool use) in a continuous loop. At each step, the agent:
- Observes the current state (user input, previous tool results)
- Thinks about what to do next (chain-of-thought reasoning)
- Acts by calling a tool or providing a response
- Observes the result and repeats
The ReAct pattern is elegant because it leverages the LLM's natural ability to reason in text while grounding that reasoning in real-world observations. It excels at tasks where the next step depends on the outcome of the previous one, such as research, debugging, or data analysis.
Plan-and-Execute
The Plan-and-Execute pattern separates planning from execution into distinct phases. First, a planning agent analyses the task and produces a structured plan with ordered steps. Then, an execution agent carries out each step, reporting results back to the planner. The planner can revise the plan based on outcomes.
This pattern is superior for complex, multi-step tasks where upfront planning reduces errors and wasted effort. It also provides better transparency, as the plan can be reviewed by humans before execution begins.
Tree of Thoughts
Tree of Thoughts extends chain-of-thought reasoning by exploring multiple reasoning paths simultaneously. The agent generates several candidate approaches, evaluates each, and pursues the most promising paths while pruning dead ends. This pattern is valuable for creative problem-solving and tasks with multiple valid approaches.
Reflexion
The Reflexion pattern adds a self-evaluation step to the agent loop. After completing a task or sub-task, the agent critiques its own output, identifies weaknesses or errors, and iterates to improve the result. This is particularly effective for writing, code generation, and analysis tasks where quality matters more than speed.
Building Blocks of an AI Agent
Every AI agent system requires four foundational components that work together to enable autonomous behaviour.
LLM Backbone
The large language model serves as the reasoning engine. The choice of LLM significantly impacts agent capabilities:
- Claude (Anthropic): Excels at complex reasoning, tool use, and following detailed instructions. Strong safety properties make it suitable for production agents handling sensitive operations
- GPT-4 (OpenAI): Robust tool calling, extensive third-party ecosystem, and strong performance across diverse tasks
- Open-source models (Llama, Mistral): Provide full control over deployment and costs, suitable for agents that need to run on-premises or at high volume
Tool Integration
Tools extend the agent beyond text generation into real-world interaction. A well-designed tool interface includes:
- Clear, descriptive tool names and documentation that help the LLM understand when to use each tool
- Structured input schemas with validation to prevent malformed calls
- Error handling that returns informative messages the LLM can act on
- Timeouts and rate limits to prevent runaway operations
Memory System
For agents that operate across sessions or handle long tasks, a memory system is essential. Implement memory using:
- Conversation buffer: Keep the full conversation history in the LLM context (simple but limited by context window)
- Summary memory: Periodically summarise older conversation turns to compress context
- Vector store: Embed and store important information for semantic retrieval when relevant
- Persistent files: Write key decisions and facts to files that persist across sessions
Orchestration Layer
The orchestration layer manages the agent loop, tool execution, error recovery, and interaction with external systems. It handles:
- Message formatting and context management for the LLM
- Tool call parsing, execution, and result injection
- Retry logic and error recovery
- Logging, monitoring, and observability
- Human-in-the-loop approvals for sensitive actions
Step-by-Step: Building a Basic Agent in Python
Let us build a simple ReAct agent from scratch using Python and the Anthropic API. This example demonstrates the core concepts without framework dependencies.
Step 1: Define Your Tools
import anthropic
import json
# Define tools the agent can use
tools = [
{
"name": "search_web",
"description": "Search the web for current information on a topic",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
}
},
"required": ["query"]
}
},
{
"name": "calculate",
"description": "Perform a mathematical calculation",
"input_schema": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Mathematical expression to evaluate"
}
},
"required": ["expression"]
}
}
]Step 2: Implement Tool Execution
def execute_tool(tool_name, tool_input):
"""Execute a tool and return the result."""
if tool_name == "search_web":
# Replace with actual search API integration
return search_api(tool_input["query"])
elif tool_name == "calculate":
try:
result = eval(tool_input["expression"]) # Use safe eval in production
return str(result)
except Exception as e:
return f"Error: {str(e)}"
return "Unknown tool"Step 3: Build the Agent Loop
def run_agent(user_message, max_iterations=10):
client = anthropic.Anthropic()
messages = [{"role": "user", "content": user_message}]
system_prompt = """You are a helpful AI agent. Use the available tools
to research and answer questions accurately. Think step by step."""
for i in range(max_iterations):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=system_prompt,
tools=tools,
messages=messages
)
# Check if the agent wants to use a tool
if response.stop_reason == "tool_use":
# Extract tool calls from response
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
# Add assistant response and tool results to messages
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
else:
# Agent is done, return final response
return response.content[0].text
return "Agent reached maximum iterations"Multi-Agent Orchestration Patterns
For complex workflows, single agents often fall short. Multi-agent systems distribute work across specialised agents that collaborate to solve problems.
Sequential Pipeline
Agents process tasks in a linear sequence, each passing its output to the next. For example: Research Agent gathers information, Analysis Agent processes it, Writing Agent produces the final report. This pattern is simple and predictable but lacks parallelism.
Supervisor Pattern
A supervisor agent manages a team of worker agents, delegating tasks and aggregating results. The supervisor decides which worker to engage based on the current task requirements. This pattern provides strong control and is well-suited for workflows with clear role boundaries.
Collaborative Discussion
Multiple agents engage in a structured discussion, each contributing their specialised perspective. A moderator agent manages the conversation flow and synthesises the final output. This pattern excels at tasks requiring diverse expertise, such as code review or strategic planning.
Hierarchical Delegation
Agents are organised in a hierarchy where high-level agents break down tasks and delegate to lower-level specialists. Each level can further decompose and delegate, creating a tree of coordinated work. This pattern scales well for large, complex projects.
Evaluation and Testing AI Agents
Testing AI agents is fundamentally different from testing traditional software due to the non-deterministic nature of LLM outputs.
Unit Testing Tools
Test each tool in isolation with known inputs and expected outputs. Tool tests should be deterministic and cover edge cases including error conditions, malformed inputs, and timeout scenarios.
Trajectory Testing
Record and evaluate the sequence of actions an agent takes to complete a task. Good trajectory tests verify that the agent follows a reasonable path, not just that it reaches the correct answer. Compare trajectories against expert-defined golden paths.
End-to-End Evaluation
Run the full agent against a benchmark suite of tasks with known correct outcomes. Measure success rate, average number of steps, tool call efficiency, and cost per task. Track these metrics over time to detect regressions.
Adversarial Testing
Test agents with adversarial inputs including prompt injection attempts, ambiguous instructions, impossible tasks, and conflicting requirements. Verify that the agent handles these gracefully without producing harmful outputs or entering infinite loops.
Production Deployment Considerations
Moving AI agents from prototype to production requires addressing several critical concerns.
Cost Management
LLM API calls are the primary cost driver for AI agents. Control costs by:
- Setting maximum iteration limits per task
- Using smaller models for routine reasoning and larger models for complex decisions
- Implementing caching for repeated tool calls and LLM queries
- Monitoring per-task costs and setting budget alerts
Reliability and Error Recovery
Production agents must handle failures gracefully:
- Implement retry logic with exponential backoff for transient API failures
- Design fallback strategies when primary tools are unavailable
- Add circuit breakers to prevent cascading failures
- Log all agent actions for debugging and audit trails
Safety and Guardrails
Deploy safety mechanisms to prevent agents from taking harmful actions:
- Require human approval for high-impact operations (financial transactions, data deletion, external communications)
- Implement allowlists for permitted tool actions and blocklists for forbidden operations
- Add output filtering to catch sensitive information before it leaves the system
- Set up monitoring alerts for anomalous agent behaviour
Observability
Comprehensive observability is essential for understanding and improving agent performance:
- Trace every agent execution including all reasoning steps, tool calls, and decisions
- Track latency, token usage, and cost per interaction
- Monitor success rates and failure modes across different task types
- Build dashboards that give operations teams real-time visibility into agent health
How Workstation Builds Custom AI Agent Solutions
At Workstation, we have deep expertise in designing and deploying production AI agent systems for businesses across industries. Our approach includes:
- Discovery and design: We work with your team to identify high-value automation opportunities and design agent architectures that integrate with your existing infrastructure
- Rapid prototyping: We build working agent prototypes quickly, allowing you to evaluate capabilities and provide feedback before committing to full development
- Production engineering: Our team handles the complexities of production deployment including reliability, security, cost optimisation, and monitoring
- Custom tool development: We build bespoke tools that connect your AI agents to internal systems, databases, APIs, and third-party services
- Ongoing optimisation: We continuously monitor and improve your agent systems, reducing costs, increasing accuracy, and expanding capabilities over time
Ready to build AI agents that transform your business operations? Contact us at info@workstation.co.uk to discuss your requirements and see how our team can bring your AI agent vision to life.