AI Agent Setup: A Complete Guide to Deploying AI Agents in Production
Learn how to set up and deploy AI agents with this step-by-step guide. Compare cloud, self-hosted, and serverless options for AI agent hosting.
AI Agent Setup: A Complete Guide to Deploying AI Agents in Production
You have built an AI agent that works perfectly on your laptop. It responds to prompts, calls tools, and even handles multi-step reasoning. But when you try to deploy it, everything falls apart. Environment variables go missing. Dependencies conflict. Scaling becomes a nightmare. The deployment gap between a working prototype and a production-ready AI agent is where most projects die.
This guide closes that gap. You will learn the three main approaches to AI agent hosting, walk through a complete deployment example, and discover when to build versus when to use a platform like Nexus.
AI Agent Deployment Options
Choosing where to host your AI agent is the first major decision. Each option has tradeoffs in cost, control, and complexity.
Cloud Platforms (AWS, GCP, Azure)
Cloud providers offer managed services like AWS ECS, Google Cloud Run, and Azure Container Apps. These platforms handle infrastructure, load balancing, and auto-scaling for you.
Pros:
- Managed infrastructure reduces operational burden
- Auto-scaling handles traffic spikes
- Built-in monitoring and logging
- Global deployment regions
Cons:
- Vendor lock-in increases switching costs
- Complex pricing that surprises at scale
- Steep learning curve for each platform
- Cold starts can add latency
Best for: Teams with DevOps expertise, applications needing global scale, or organizations already invested in a cloud ecosystem.
Self-Hosted (VPS, Dedicated Servers)
Running your AI agent on a Virtual Private Server or dedicated machine gives you complete control. You install the OS, manage dependencies, and configure everything yourself.
Pros:
- Full control over the environment
- Predictable monthly costs
- No cold start latency
- Data stays on your infrastructure
Cons:
- You are responsible for security patches
- Manual scaling requires intervention
- No built-in redundancy
- Maintenance burden falls entirely on you
Best for: Privacy-sensitive applications, predictable workloads, or teams wanting to avoid cloud complexity.
Serverless (Lambda, Cloud Functions, Edge)
Serverless platforms run your code in response to events without managing servers. You pay per execution rather than for provisioned capacity.
Pros:
- Zero server management
- Cost-effective for sporadic workloads
- Automatic scaling to zero and back
- Fast deployment cycles
Cons:
- Execution time limits (usually 15 minutes or less)
- Cold starts add latency
- Limited environment customization
- Debugging distributed logs is harder
Best for: Event-driven agents, webhook handlers, or workloads with unpredictable traffic patterns.
Deployment Options Comparison
| Factor | Cloud | Self-Hosted | Serverless | |--------|-------|-------------|------------| | Setup Complexity | Medium | High | Low | | Operational Burden | Medium | High | Low | | Scaling | Auto | Manual | Auto | | Cost Predictability | Low | High | Medium | | Cold Start Latency | Medium | None | High | | Customization | Medium | High | Low | | Best For | Scale, teams | Control, privacy | Events, sporadic |
Step-by-Step Tutorial: Deploying a Python AI Agent
This tutorial deploys a simple AI agent using FastAPI. The agent accepts HTTP requests, processes them with an LLM, and returns responses. By the end, you will have a production-ready deployment.
Prerequisites
You will need:
- Python 3.11 or higher installed
- An OpenAI API key (or another LLM provider)
- A server or cloud account for deployment
- Basic familiarity with command line tools
Step 1: Create the Project Structure
Start by creating a directory for your agent:
mkdir ai-agent-api
cd ai-agent-api
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Create the required files:
touch main.py requirements.txt Dockerfile .env.example
Step 2: Build the FastAPI Application
Open main.py and add the following code:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import openai
import os
from typing import Optional
app = FastAPI(title="AI Agent API", version="1.0.0")
# Configure OpenAI client
client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
class AgentRequest(BaseModel):
prompt: str
context: Optional[str] = None
max_tokens: Optional[int] = 500
class AgentResponse(BaseModel):
response: str
tokens_used: int
model: str
@app.get("/health")
async def health_check():
return {"status": "healthy", "service": "ai-agent-api"}
@app.post("/agent/ask", response_model=AgentResponse)
async def ask_agent(request: AgentRequest):
try:
# Build messages
messages = [{"role": "user", "content": request.prompt}]
if request.context:
messages.insert(0, {
"role": "system",
"content": request.context
})
# Call LLM
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
max_tokens=request.max_tokens
)
return AgentResponse(
response=completion.choices[0].message.content,
tokens_used=completion.usage.total_tokens,
model="gpt-4o-mini"
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
Step 3: Define Dependencies
Add the following to requirements.txt:
fastapi==0.109.0
uvicorn[standard]==0.27.0
openai==1.12.0
pydantic==2.6.0
python-dotenv==1.0.0
Step 4: Containerize with Docker
Create a Dockerfile:
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY main.py .
# Expose port
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import requests; requests.get('http://localhost:8000/health')" || exit 1
# Run application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Step 5: Deploy to Production
Build and run locally first:
docker build -t ai-agent-api .
docker run -p 8000:8000 -e OPENAI_API_KEY=your_key_here ai-agent-api
Test the deployment:
curl -X POST http://localhost:8000/agent/ask \
-H "Content-Type: application/json" \
-d '{"prompt": "What is the capital of France?"}'
For cloud deployment, push to a container registry and deploy to your chosen platform. The Dockerfile works with AWS ECS, Google Cloud Run, Azure Container Apps, or any Docker-compatible host.
Common Mistakes When Deploying AI Agents
Even experienced developers hit these pitfalls. Here are the top five mistakes and how to avoid them.
Mistake 1: Hardcoding API Keys
The Problem: Embedding API keys directly in code creates security risks and makes rotation difficult.
The Fix: Use environment variables for all secrets. In Python, load them with os.getenv() or python-dotenv for local development. Never commit .env files to version control.
Mistake 2: Ignoring Rate Limits
The Problem: LLM APIs have rate limits. Exceeding them causes failed requests and degraded user experience.
The Fix: Implement exponential backoff with libraries like tenacity. Track your token usage and set up alerts before hitting limits. Consider request queuing for high-traffic applications.
Mistake 3: No Health Checks
The Problem: Deployments without health checks fail silently. Load balancers cannot detect unhealthy instances.
The Fix: Add a /health endpoint that validates critical dependencies, including your LLM connection. Configure your orchestrator to use this endpoint for health checks and auto-restart failed containers.
Mistake 4: Missing Input Validation
The Problem: Unvalidated inputs lead to injection attacks, excessive token usage, or application crashes.
The Fix: Use Pydantic models to validate all incoming requests. Set maximum token limits, sanitize user inputs, and reject malformed requests before they reach your LLM provider.
Mistake 5: No Observability
The Problem: Without logging and metrics, debugging production issues becomes guesswork.
The Fix: Instrument your code with structured logging. Track request latency, token usage, and error rates. Tools like Prometheus, Grafana, or cloud-native solutions provide visibility into your agent's behavior.
When to Use a Platform Like Nexus
Building from scratch makes sense for learning or highly specialized use cases. But most teams eventually hit limitations that platforms solve better than custom code.
Sign 1: You Are Managing Multiple Agents
When you have more than three agents in production, coordination becomes complex. You need shared memory, inter-agent communication, and unified deployment pipelines. Platforms provide these out of the box.
Sign 2: Security Requirements Are Strict
Enterprise deployments need audit logs, role-based access control, and compliance certifications. Building these features takes months. Platforms like Nexus include enterprise security from day one.
Sign 3: You Need Hybrid Deployment
Some agents run in the cloud. Others need to stay on-premise for data privacy. Managing hybrid infrastructure with custom scripts is fragile. A platform abstracts deployment targets so you ship code, not infrastructure.
Sign 4: Scaling Is Unpredictable
Traffic spikes break poorly architected systems. Auto-scaling groups, load balancers, and database connection pools require expertise. Platforms handle scaling automatically, including zero-downtime deployments.
Sign 5: Tool Integration Is Slow
Connecting agents to Slack, email, databases, and APIs involves writing dozens of integrations. Platforms come with pre-built connectors and standardized tool interfaces. You configure, not code.
Conclusion
Deploying AI agents to production requires more than working code. You need to choose the right hosting strategy, implement proper validation and observability, and avoid common pitfalls like hardcoded secrets and missing health checks.
For simple prototypes, a self-hosted FastAPI application on a VPS works well. As requirements grow, cloud platforms provide managed infrastructure. But when you are managing multiple agents, need enterprise security, or require hybrid deployment, a platform like Nexus eliminates the infrastructure burden so you can focus on building agent capabilities.
Start with the tutorial in this guide. Deploy your first agent. Then evaluate whether building custom infrastructure or adopting a platform makes sense for your roadmap. The best deployment strategy is the one that lets you ship faster and sleep better.
Need help setting up your AI agents?
We configure production AI workflows so you can skip the weeks of trial and error.
Get Started with Nexus