Skip to main content

Agent Lifecycle

Understanding how agents process requests helps you build more effective AI workers.

Request Flow

When a message reaches an agent, here's what happens:

User Message


┌─────────────────┐
│ 1. RECEIVE │ Validate input, check permissions
└────────┬────────┘


┌─────────────────┐
│ 2. CONTEXT │ Load memory, fetch relevant context
└────────┬────────┘


┌─────────────────┐
│ 3. PROCESS │ AI model generates response
└────────┬────────┘


┌─────────────────┐
│ 4. TOOLS │ Execute any tool calls (may loop)
└────────┬────────┘


┌─────────────────┐
│ 5. RESPOND │ Return final response to user
└────────┬────────┘


┌─────────────────┐
│ 6. PERSIST │ Save memory, log interaction
└─────────────────┘

Stage Details

1. Receive

The agent receives and validates the request.

What happens:

  • Authenticate the request (API key, session)
  • Validate input format and length
  • Check rate limits
  • Verify agent is active

Possible outcomes:

  • ✅ Proceed to Context
  • ❌ Return 401/403 (auth error)
  • ❌ Return 429 (rate limited)
  • ❌ Return 400 (invalid input)
// Request validation
{
agentId: 'agent-123', // Must exist and be active
message: 'Hello', // Required, max 32K chars
conversationId: 'conv-1', // Optional, for context
userId: 'user-456' // Optional, for memory
}

2. Context

The agent gathers relevant context for the request.

What's loaded:

  • Conversation history (if continuing)
  • User-specific memory
  • Team/app-level knowledge
  • System prompt

Context assembly:

┌─────────────────────────────────────────┐
│ CONTEXT WINDOW │
├─────────────────────────────────────────┤
│ System Prompt │
│ ─────────────────────────────────────── │
│ You are an IT Helpdesk assistant... │
├─────────────────────────────────────────┤
│ Memory / Knowledge │
│ ─────────────────────────────────────── │
│ User's laptop: ThinkPad X1 Carbon │
│ Previous issue: VPN connection (solved) │
├─────────────────────────────────────────┤
│ Conversation History │
│ ─────────────────────────────────────── │
│ User: My email isn't working │
│ Agent: I can help with that... │
│ User: It says "connection failed" │
├─────────────────────────────────────────┤
│ Current Message │
│ ─────────────────────────────────────── │
│ User: Is Google down? │
└─────────────────────────────────────────┘

3. Process

The AI model generates a response.

What happens:

  • Full context sent to model
  • Model decides: respond directly OR use tools
  • If using tools, generate tool call(s)
  • If responding, generate text response

Model decision flow:

                    Context


┌────────────────┐
│ AI Model │
└────────┬───────┘

┌───────────┴───────────┐
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Text Reply │ │ Tool Call │
└─────────────┘ └─────────────┘

4. Tools (Loop)

If the model wants to use tools, they're executed.

Tool execution loop:

Model: "I should check Google's status"


┌─────────────────────────────────────────┐
│ Tool: http_request │
│ Input: { url: "status.google.com" } │
└────────────────────┬────────────────────┘


┌─────────────────────────────────────────┐
│ Tool Result │
│ { status: "all_services_normal" } │
└────────────────────┬────────────────────┘


┌─────────────┐
│ AI Model │ ◄── Decides: more tools or respond?
└──────┬──────┘

┌───────────┴───────────┐
│ │
▼ ▼
Another tool call Generate response
(loop continues) (exit loop)

Tool limits:

  • Max 10 tool calls per turn (configurable)
  • Total timeout: 30 seconds
  • Individual tool timeout: 10 seconds

5. Respond

The final response is returned to the user.

Response structure:

{
id: 'msg-789',
agentId: 'agent-123',
conversationId: 'conv-456',
content: 'Google services are currently operational...',
toolCalls: [
{
tool: 'http_request',
input: { url: 'status.google.com' },
output: { status: 'all_services_normal' }
}
],
usage: {
inputTokens: 1250,
outputTokens: 89,
totalTokens: 1339
},
createdAt: '2024-01-15T10:30:00Z'
}

6. Persist

After responding, data is saved.

What's persisted:

  • Conversation history (user message + agent response)
  • Memory updates (if any)
  • Tool results (for debugging)
  • Analytics/metrics

Async operations:

  • Webhook notifications
  • Analytics processing
  • Memory indexing

Agent States

Agents can be in different states:

StateDescriptionCan Receive Requests?
ActiveNormal operation✅ Yes
PausedTemporarily disabled❌ No
MaintenanceBeing updated❌ No
ArchivedSoft deleted❌ No
// Check agent state
const agent = await deeployd.agents.get('agent-123');
console.log(agent.state); // 'active'

// Pause an agent
await deeployd.agents.pause('agent-123');

// Resume an agent
await deeployd.agents.resume('agent-123');

Conversation Lifecycle

Conversations also have a lifecycle:

┌─────────┐    ┌─────────┐    ┌─────────┐    ┌──────────┐
│ New │───▶│ Active │───▶│ Idle │───▶│ Archived │
└─────────┘ └─────────┘ └─────────┘ └──────────┘
│ │
│ │
└──────────────┘
(new message)
StateDescriptionDuration
NewJust createdUntil first message
ActiveCurrently in useDuring conversation
IdleNo recent activityAfter 30min inactivity
ArchivedStored for referenceAfter 90 days

Error Handling

Errors can occur at any stage:

Common Errors

ErrorStageCauseResolution
AuthenticationErrorReceiveInvalid API keyCheck credentials
RateLimitErrorReceiveToo many requestsWait or upgrade
ContextTooLongContextHistory exceeds limitSummarize or start new
ModelErrorProcessAI model failureRetry or fallback
ToolErrorToolsTool execution failedCheck tool config
TimeoutErrorAnyRequest took too longOptimize or increase limit

Error Recovery

try {
const response = await deeployd.agents.chat({
agentId: 'agent-123',
message: 'Hello'
});
} catch (error) {
if (error.code === 'RATE_LIMITED') {
// Wait and retry
await sleep(error.retryAfter);
return retry();
}
if (error.code === 'CONTEXT_TOO_LONG') {
// Start fresh conversation
return deeployd.agents.chat({
agentId: 'agent-123',
message: 'Hello',
conversationId: null // New conversation
});
}
throw error;
}

Performance Considerations

Latency Breakdown

Typical request latency:

StageTypical TimeRange
Receive10ms5-50ms
Context50ms20-200ms
Process500ms200ms-5s
Tools0-2000msPer tool call
Respond10ms5-50ms
PersistAsyncN/A

Total: 500ms - 7s depending on complexity

Optimization Tips

  1. Reduce context size - Summarize long conversations
  2. Limit tools - Only enable needed tools
  3. Use streaming - Get faster first-byte
  4. Cache when possible - Avoid redundant lookups

Next: Learn about the Execution Model to understand how tasks and workflows run.