Observability & Telemetry

MeetLoyd implements OpenTelemetry (OTel) for standardized observability of AI agent executions, providing distributed tracing and metrics following industry-standard semantic conventions.

Overview

The telemetry system provides:

Distributed Tracing: Track requests across agent executions, LLM calls, and tool invocations
Metrics Collection: Counters and histograms for executions, tokens, latency, and costs
Gen AI Semantic Conventions: Industry-standard gen_ai.* attributes for LLM observability
OTLP Export: Compatible with any OpenTelemetry-compatible backend

Architecture

The telemetry pipeline flows from the MeetLoyd platform (agent execution spans, LLM call spans, tool execution spans) through the OpenTelemetry SDK (BatchSpanProcessor, MetricReader) to the OTLP/HTTP Exporter, which delivers data to your observability backend (Grafana, Honeycomb, Datadog, Jaeger, etc.).

Traced Operations

Agent Execution Spans

Root spans created for each agent execution:

Attribute	Description
gen_ai.agent.id	Unique agent identifier
gen_ai.agent.name	Human-readable agent name
gen_ai.request.model	LLM model used
gen_ai.provider.name	Provider (anthropic, openai, etc.)
meetloyd.tenant.id	Tenant identifier
meetloyd.user.id	User who triggered execution

LLM Call Spans

Child spans for each LLM API call:

Attribute	Description
gen_ai.operation.name	Operation type (chat, completion)
gen_ai.request.model	Model requested
gen_ai.request.max_tokens	Max tokens parameter
gen_ai.request.temperature	Temperature parameter
gen_ai.usage.input_tokens	Input tokens consumed
gen_ai.usage.output_tokens	Output tokens generated
gen_ai.response.id	Provider response ID
gen_ai.response.finish_reasons	Why generation stopped

Tool Execution Spans

Child spans for each tool invocation:

Attribute	Description
gen_ai.tool.name	Tool identifier
gen_ai.tool.call.id	Unique call identifier
gen_ai.tool.type	Tool type (builtin, custom)

Metrics

Counters

Metric	Description
gen_ai.agent.executions	Total agent executions
gen_ai.llm.calls	Total LLM API calls
gen_ai.tool.calls	Total tool invocations
gen_ai.tokens.input	Total input tokens
gen_ai.tokens.output	Total output tokens
gen_ai.errors	Total errors by type

Histograms

Metric	Description
gen_ai.agent.duration	Execution duration (ms)
gen_ai.llm.latency	LLM call latency (ms)
gen_ai.tool.latency	Tool execution latency (ms)
gen_ai.cost.usd	Execution cost (USD)
gen_ai.tokens.per_execution	Tokens per execution

Supported Backends

MeetLoyd's OTLP export is compatible with:

Backend	Notes
Grafana Cloud (Recommended)	Fully managed, generous free tier, cloud-agnostic
Honeycomb	Trace-first observability
Datadog	Full-stack monitoring
Axiom	Log analytics
AWS X-Ray	Via OpenTelemetry Collector
Azure Monitor	Via Azure Monitor Exporter
Google Cloud Trace	Via GCP exporter
Any OTLP-compatible collector	Universal

Configuration

OpenTelemetry is disabled by default. To enable, configure these environment variables:

Variable	Description	Required
OTEL_ENABLED	Set to "true" to enable telemetry	Yes
OTEL_EXPORTER_OTLP_ENDPOINT	Your backend's OTLP endpoint URL	Yes (when enabled)
OTEL_EXPORTER_OTLP_HEADERS	Authentication headers (e.g., "Authorization=Basic ...")	Depends on backend
OTEL_SERVICE_NAME	Service name (default: "meetloyd")	No
OTEL_SERVICE_VERSION	Service version	No
OTEL_SAMPLE_RATE	Sampling rate 0.0-1.0 (default: 1.0, use 0.1 for high-volume production)	No
OTEL_CONSOLE_EXPORT	Set to "true" for console debugging (development only)	No

Enterprise Cloud Deployment

Grafana Cloud (Recommended)

Create a Grafana Cloud account at grafana.com
Navigate to Connections > Add new connection > OpenTelemetry (OTLP)
Generate an API token with metrics:write and traces:write scopes
Set OTEL_EXPORTER_OTLP_ENDPOINT to your Grafana OTLP gateway URL
Set OTEL_EXPORTER_OTLP_HEADERS with your Base64-encoded credentials
Features: Trace explorer with gen_ai attribute filtering, pre-built LLM dashboards, alerting, 50 GB free tier

AWS

Option 1: ADOT + X-Ray -- Deploy the AWS Distro for OpenTelemetry Collector as a sidecar or daemon. Point OTEL_EXPORTER_OTLP_ENDPOINT to the local collector. Traces go to X-Ray, metrics to CloudWatch.

Option 2: Direct to Grafana Cloud -- Point OTEL_EXPORTER_OTLP_ENDPOINT directly to Grafana Cloud from your AWS deployment.

Azure

Option 1: Azure Monitor Application Insights -- Azure Monitor supports OTLP ingestion directly. Use the Application Insights endpoint and instrumentation key.

Option 2: OpenTelemetry Collector -- Deploy the OTel Collector with Azure Monitor exporter in AKS or Container Apps.

Google Cloud Platform

Option 1: Cloud Trace Direct Export -- Point to Cloud Trace endpoint with service account credentials.

Option 2: OpenTelemetry Collector -- Deploy on GKE or Cloud Run with the Google Cloud exporter.

Multi-Cloud / Hybrid

For multi-cloud deployments, use Grafana Cloud as the centralized backend with OpenTelemetry Collectors deployed in each cloud. The same configuration works everywhere.

Security Considerations

What's Captured

Telemetry data includes agent and user identifiers, model names and parameters, token counts and costs, and execution timing.

What's NOT Captured

Telemetry does not contain prompt content or responses, user messages, tool input/output data, or PII/sensitive business data.

Network Security

All OTLP exports use HTTPS. Authentication headers are supported. No data is exported when telemetry is disabled.

Compliance

OpenTelemetry supports SOC 2 (audit trail of AI operations), ISO 27001 (monitoring and incident detection), and GDPR (no PII in telemetry data by design).

Graceful Degradation

When telemetry is disabled or misconfigured, all tracing functions return no-ops with zero performance overhead. The application continues normally. A warning is logged if telemetry is enabled without an endpoint.

Additional Observability

Beyond OpenTelemetry, MeetLoyd provides:

Structured Logging: Pino-based JSON logs with correlation IDs
Chain of Thought Logging: Full reasoning capture in database
Agent Run Tracking: Execution history in the agent runs table
Audit Logs: Security-relevant events for compliance

See Audit Logs and SIEM Integration for more.

Overview​

Architecture​

Traced Operations​

Agent Execution Spans​

LLM Call Spans​

Tool Execution Spans​

Metrics​

Counters​

Histograms​

Supported Backends​

Configuration​

Enterprise Cloud Deployment​

Grafana Cloud (Recommended)​

AWS​

Azure​

Google Cloud Platform​

Multi-Cloud / Hybrid​

Security Considerations​

What's Captured​

What's NOT Captured​

Network Security​

Compliance​

Graceful Degradation​

Additional Observability​