Sovereign Moderation

MeetLoyd includes a content moderation layer as part of its enterprise governance. Every agent interaction -- input and output -- passes through configurable moderation that works consistently across all LLM providers.

Why Moderation Is Part of Governance

LLM providers (Anthropic, OpenAI, Google, open-source models) include their own built-in safety training. Your agents are not running unfiltered models.

MeetLoyd's moderation adds enterprise governance on top of model-provided safety:

Configurable thresholds -- per governance pack (HIPAA uses stricter thresholds than default)
Audit trail -- every moderation decision is logged with category scores, filterable by compliance teams
Vendor-agnostic -- same moderation policy applies whether the agent uses Claude, GPT, Gemini, DeepSeek, or any other provider
Admin control -- tenant administrators configure mode, thresholds, and overrides
Transparency -- customers see every score and every decision (unlike vendor black-box filters)

Two Modes

Standard

Uses the OpenAI Moderation API -- free for API users, no per-call charge. Content is sent to OpenAI for classification. Not used for training (since March 2023), but retained in abuse monitoring logs for up to 30 days by default.

Each tenant provides their own OpenAI API key (BYOK pattern). MeetLoyd does not use a platform-level key.

Best for: Teams that prioritize speed and cost. US-based organizations. Non-regulated industries.

Sovereign

Self-hosted content moderation that never sends content to external APIs. Runs on CPU -- no GPU required, near-zero marginal cost, minimal carbon footprint. No external API key needed.

Best for: EU enterprises, regulated industries, data-sensitive organizations. Any team with data residency requirements or policies that prohibit sending content to external services.

Sovereign mode evaluates multiple content safety categories independently, with thresholds tuned for business content to minimize false positives on legitimate language (sales negotiations, legal terminology, medical discussions).

LLM Escalation (Sovereign Upgrade)

Sovereign mode includes an optional accuracy upgrade for borderline content. When the base classifier flags content as borderline (suspicious but below the block threshold), a self-hosted LLM re-classifies it with contextual understanding -- distinguishing "aggressive negotiation strategy" from an actual threat.

Key properties:

Nothing leaves your infrastructure -- the escalation LLM is fully self-hosted
Token-metered -- usage decrements your prepaid account. Empty account gracefully falls back to base classification + audit
Configurable fallback chain -- administrators define a resilience chain of self-hosted endpoints. If all are unavailable, moderation never blocks because of LLM infrastructure failures
Business continuity -- the base classifier (CPU) continues to operate independently of any LLM availability

Explainable Safety (Optional)

For compliance-heavy environments, MeetLoyd offers an asynchronous explanation layer. When a moderation decision is made, a detailed human-readable explanation of WHY content was flagged or allowed is generated post-hoc and attached to the audit log entry.

This runs asynchronously in batches -- it does not add latency to agent responses. Explanations are available in the audit trail for compliance review, typically within minutes.

EU AI Act Compliance

Both modes satisfy the human oversight requirements of Article 14:

Requirement	How MeetLoyd Satisfies It
Understand system capacities	Category thresholds visible in admin UI, documented
Monitor operation	Full audit log with per-category scores, filterable
Intervene	Change thresholds, override decisions, toggle modes
Stop the system	Disable moderation or switch modes at any time

Carbon Footprint

Mode	Carbon Impact
Standard	Depends on OpenAI's infrastructure
Sovereign (base)	Near-zero -- CPU-only, ~0.1W per classification
Sovereign + LLM escalation	Low -- GPU used only for borderline content (~15-20% of calls)

Configuration

Set your moderation mode in Settings > Security > Content Moderation:

Mode: Select Standard or Sovereign
LLM Escalation: Enable or disable (sovereign only)
Fallback endpoints: Configure your resilience chain (enterprise)
Thresholds: Adjust per-category sensitivity (optional)

For Standard mode, ensure your tenant has an OpenAI API key configured in the BYOK settings.

For Sovereign mode, no external key is needed. The base classifier runs on CPU within your deployment. If you enable LLM escalation, configure the self-hosted endpoints and ensure your prepaid account has a balance.

Pricing

Component	Starter	Growth	Enterprise
Standard moderation (requires OpenAI key)	Included	Included	Included
Sovereign moderation (no key needed)	--	Add-on	Included
LLM escalation (real-time, token-metered)	--	--	Included (prepaid)
Explainable safety (async)	--	--	Add-on

Security Overview

MeetLoyd's security architecture

Zero Trust for Agents

Zero Trust compliance mapping

Governance Packs

GDPR, HIPAA, EU AI Act modules

Why Moderation Is Part of Governance​

Two Modes​

Standard​

Sovereign​

LLM Escalation (Sovereign Upgrade)​

Explainable Safety (Optional)​

EU AI Act Compliance​

Carbon Footprint​

Configuration​

Pricing​