Sovereign Moderation
MeetLoyd includes a content moderation layer as part of its enterprise governance. Every agent interaction -- input and output -- passes through configurable moderation that works consistently across all LLM providers.
Why Moderation Is Part of Governance
LLM providers (Anthropic, OpenAI, Google, open-source models) include their own built-in safety training. Your agents are not running unfiltered models.
MeetLoyd's moderation adds enterprise governance on top of model-provided safety:
- Configurable thresholds -- per governance pack (HIPAA uses stricter thresholds than default)
- Audit trail -- every moderation decision is logged with category scores, filterable by compliance teams
- Vendor-agnostic -- same moderation policy applies whether the agent uses Claude, GPT, Gemini, DeepSeek, or any other provider
- Admin control -- tenant administrators configure mode, thresholds, and overrides
- Transparency -- customers see every score and every decision (unlike vendor black-box filters)
Two Modes
Standard
Uses the OpenAI Moderation API -- free for API users, no per-call charge. Content is sent to OpenAI for classification. Not used for training (since March 2023), but retained in abuse monitoring logs for up to 30 days by default.
Each tenant provides their own OpenAI API key (BYOK pattern). MeetLoyd does not use a platform-level key.
Best for: Teams that prioritize speed and cost. US-based organizations. Non-regulated industries.
Sovereign
Self-hosted content moderation that never sends content to external APIs. Runs on CPU -- no GPU required, near-zero marginal cost, minimal carbon footprint. No external API key needed.
Best for: EU enterprises, regulated industries, data-sensitive organizations. Any team with data residency requirements or policies that prohibit sending content to external services.
Sovereign mode evaluates multiple content safety categories independently, with thresholds tuned for business content to minimize false positives on legitimate language (sales negotiations, legal terminology, medical discussions).
LLM Escalation (Sovereign Upgrade)
Sovereign mode includes an optional accuracy upgrade for borderline content. When the base classifier flags content as borderline (suspicious but below the block threshold), a self-hosted LLM re-classifies it with contextual understanding -- distinguishing "aggressive negotiation strategy" from an actual threat.
Key properties:
- Nothing leaves your infrastructure -- the escalation LLM is fully self-hosted
- Token-metered -- usage decrements your prepaid account. Empty account gracefully falls back to base classification + audit
- Configurable fallback chain -- administrators define a resilience chain of self-hosted endpoints. If all are unavailable, moderation never blocks because of LLM infrastructure failures
- Business continuity -- the base classifier (CPU) continues to operate independently of any LLM availability
Explainable Safety (Optional)
For compliance-heavy environments, MeetLoyd offers an asynchronous explanation layer. When a moderation decision is made, a detailed human-readable explanation of WHY content was flagged or allowed is generated post-hoc and attached to the audit log entry.
This runs asynchronously in batches -- it does not add latency to agent responses. Explanations are available in the audit trail for compliance review, typically within minutes.
EU AI Act Compliance
Both modes satisfy the human oversight requirements of Article 14:
| Requirement | How MeetLoyd Satisfies It |
|---|---|
| Understand system capacities | Category thresholds visible in admin UI, documented |
| Monitor operation | Full audit log with per-category scores, filterable |
| Intervene | Change thresholds, override decisions, toggle modes |
| Stop the system | Disable moderation or switch modes at any time |
Carbon Footprint
| Mode | Carbon Impact |
|---|---|
| Standard | Depends on OpenAI's infrastructure |
| Sovereign (base) | Near-zero -- CPU-only, ~0.1W per classification |
| Sovereign + LLM escalation | Low -- GPU used only for borderline content (~15-20% of calls) |
Pricing
| Component | Starter | Growth | Enterprise |
|---|---|---|---|
| Standard moderation (requires OpenAI key) | Included | Included | Included |
| Sovereign moderation (no key needed) | -- | Add-on | Included |
| LLM escalation (real-time, token-metered) | -- | -- | Included (prepaid) |
| Explainable safety (async) | -- | -- | Add-on |