Mixture-of-Experts-and-Depths: A Hierarchical Dynamic Compute Architecture for Extreme-Scale Efficiency
  • Author(s): Kalyan Chakravarthy Kodela
  • Paper ID: 1710302
  • Page: 987-999
  • Published Date: 29-08-2025
  • Published In: Iconic Research And Engineering Journals
  • Publisher: IRE Journals
  • e-ISSN: 2456-8880
  • Volume/Issue: Volume 9 Issue 2 August-2025
Abstract

The pursuit of larger, more capable Large Language Models (LLMs) is fundamentally constrained by the immense computational cost of their training and inference. While the Mixture-of-Experts (MoE) paradigm successfully decouples model parameter count from computational cost by dynamically scaling network width, it neglects the critical dimension of depth, enforcing a uniform and often wasteful computational graph for all tokens. This paper introduces Mixture-of-Experts-and-Depths (MoED), a novel architectural framework that synergistically unifies dynamic width and depth scaling. MoED employs a hierarchical routing mechanism, where a meta-controller at each layer makes a joint decision on both expert selection and a token's subsequent computational path—whether to exit, proceed, or skip ahead. This approach creates a unique, input-adaptive sub-network for every token, optimizing the allocation of compute. The proposed architecture presents a fundamental shift towards more efficient and scalable LLMs, theoretically enabling superior performance and reduced latency while managing the activation memory bottlenecks that plague traditional trillion-parameter models.

Keywords

Large Language Models, Mixture of Experts, Adaptive Computation, Dynamic Networks, Hierarchical Routing, Efficient Inference

Citations

IRE Journals:
Kalyan Chakravarthy Kodela "Mixture-of-Experts-and-Depths: A Hierarchical Dynamic Compute Architecture for Extreme-Scale Efficiency" Iconic Research And Engineering Journals Volume 9 Issue 2 2025 Page 987-999

IEEE:
Kalyan Chakravarthy Kodela "Mixture-of-Experts-and-Depths: A Hierarchical Dynamic Compute Architecture for Extreme-Scale Efficiency" Iconic Research And Engineering Journals, 9(2)