Ian Rudd. Ai Safety Model Moe: A Mixture-of-experts Architecture For Certifiable, Risk-aware Generation


Natural Sciences / Computer Science / Artificial intelligence

Submitted on: Nov 03, 2025, 14:57:22

Description: Mixture-of-Experts (MoE) models scale efficiently by sending each input to just a few specialists (âEśexpertsâEť). Most safety controls today sit after the model generates text (filters and classifiers), which means unsafe content can still be produced internally and only blocked at the end. This paper proposes Safety-MoE, a design that pushes safety into the architecture: a risk-aware router that uses safety signals to choose expertsÍľ an auditor that watches the output as it is being written and can switch to safer experts or stopÍľ and a final gate that either approves the answer or abstains and offers a safe alternative. We formalize safety as a risk functional â„›(𝑥, 𝑦), train with a utility-vs-risk objective, and show how to calibrate the abstention threshold so the overall systemâE™s risk stays under a target budget 𝜏. Color-coded figure illustrates the architecture and the safety-utility trade-off

The Library of Congress (USA) reference page : http://lccn.loc.gov/cn2013300046.

To read the article posted on Intellectual Archive web site please click the link below.

AI Safety MoE paper - Dr. IAN RUDD - v 2.0.pdf



© Shiny World Corp., 2011-2025. All rights reserved. To reach us please send an e-mail to support@IntellectualArchive.com