Mixture-of-Experts-and-Depths: A Hierarchical Dynamic Compute Architecture for Extreme-Scale Efficiency

Kalyan Chakravarthy Kodela

Mixture-of-Experts-and-Depths: A Hierarchical Dynamic Compute Architecture for Extreme-Scale Efficiency

Author(s): Kalyan Chakravarthy Kodela
Paper ID: 1710302
Page: 987-999
Published Date: 29-08-2025
Published In: Iconic Research And Engineering Journals
Publisher: IRE Journals
e-ISSN: 2456-8880
Volume/Issue: Volume 9 Issue 2 August-2025

Download

Abstract

The pursuit of larger, more capable Large Language Models (LLMs) is fundamentally constrained by the immense computational cost of their training and inference. While the Mixture-of-Experts (MoE) paradigm successfully decouples model parameter count from computational cost by dynamically scaling network width, it neglects the critical dimension of depth, enforcing a uniform and often wasteful computational graph for all tokens. This paper introduces Mixture-of-Experts-and-Depths (MoED), a novel architectural framework that synergistically unifies dynamic width and depth scaling. MoED employs a hierarchical routing mechanism, where a meta-controller at each layer makes a joint decision on both expert selection and a token's subsequent computational path—whether to exit, proceed, or skip ahead. This approach creates a unique, input-adaptive sub-network for every token, optimizing the allocation of compute. The proposed architecture presents a fundamental shift towards more efficient and scalable LLMs, theoretically enabling superior performance and reduced latency while managing the activation memory bottlenecks that plague traditional trillion-parameter models.

Keywords

Large Language Models, Mixture of Experts, Adaptive Computation, Dynamic Networks, Hierarchical Routing, Efficient Inference

Citations

IRE Journals:
Kalyan Chakravarthy Kodela "Mixture-of-Experts-and-Depths: A Hierarchical Dynamic Compute Architecture for Extreme-Scale Efficiency" Iconic Research And Engineering Journals Volume 9 Issue 2 2025 Page 987-999

IEEE:
Kalyan Chakravarthy Kodela "Mixture-of-Experts-and-Depths: A Hierarchical Dynamic Compute Architecture for Extreme-Scale Efficiency" Iconic Research And Engineering Journals, 9(2)

Mixture-of-Experts-and-Depths: A Hierarchical Dynamic Compute Architecture for Extreme-Scale Efficiency

Abstract

Keywords

Citations

About IRE Journals

Important Links

For Authors

Contact Us For Help

Connect With Us