Ian Rudd. Ai Safety Model Moe: A Mixture-of-experts Architecture For Certifiable, Risk-aware Generation

Natural Sciences / Computer Science / Artificial intelligence

Submitted on: Nov 03, 2025, 14:57:22

Description: Mixture-of-Experts (MoE) models scale efficiently by sending each input to just a few specialists (�E�experts�E�). Most safety controls today sit after the model generates text (filters and classifiers), which means unsafe content can still be produced internally and only blocked at the end. This paper proposes Safety-MoE, a design that pushes safety into the architecture: a risk-aware router that uses safety signals to choose experts; an auditor that watches the output as it is being written and can switch to safer experts or stop; and a final gate that either approves the answer or abstains and offers a safe alternative. We formalize safety as a risk functional ℛ(𝑥, 𝑦), train with a utility-vs-risk objective, and show how to calibrate the abstention threshold so the overall system�E�s risk stays under a target budget 𝜏. Color-coded figure illustrates the architecture and the safety-utility trade-off

The Library of Congress (USA) reference page : http://lccn.loc.gov/cn2013300046.

To read the article posted on Intellectual Archive web site please click the link below.
AI Safety MoE paper - Dr. IAN RUDD - v 2.0.pdf