We present MRT (Modular Reinforced Trans- formers), a production-oriented LLM architecture that achieves very high accuracy on a large number of domains (N) at low latency and low cost by com- bining: (i) small open-weight base models (7?15B parameters) selected for strong baseline accuracy and efficiency; (ii) modular domain specialists trained with LoRA/QLoRA (the v1 series); (iii) reinforced- thinking upgrades (RLHF/RLAIF + deliberative de- coding) for further accuracy gains (the rt1 series); and (iv) dynamic thinking that adapts reasoning depth per query (the x1 series). A lightweight router selects the best specialist per query. We instantiate MRT with N domains, partitioned into K sets (ds domains each). For our specific imple- mentation, we use N=500, K=50, and ds=10. For each set we fine-tune one of five strong 7?15B base models, producing K specialists (mrt-v1-1 . . . mrt-v1-K) that each achieve ?80% accuracy on their assigned ds domains. We then upgrade each v1 specialist with reinforced-thinking to obtain mrt-rt1-1 . . . mrt-rt1- K, targeting ?92%. Finally, we introduce mrt-x1-k specialists that dynamically decide ?how much to think? at inference time, preserving low latency on easy queries while invoking deeper multi-step reasoning only when beneficial. We provide a full engineering blueprint, mathematical formulation, training recipes, routing/control-flow, and a cost/accuracy ac- counting. Under realistic cloud pricing and data-prep assumptions, the total end-to-end budget for building the MRT stack described here (with K=50 specialists) is $199?$205k, aligning with a target of ??$200k?. On our internal N-domain evaluation, MRT specialists are substantially more accurate and faster than a single monolithic generalist of similar or larger size; and, on their respective domains, rt1/x1 specialists meet or exceed the accuracy we observe from state-of-the-art closed generalists (when evaluated under the same domain-specific test distributions and latency budgets).* *External, proprietary leaderboards differ; we report domain- targeted internal results rather than global claims.
small LLMs (7?15B), LoRA/QLoRA, RLHF/RLAIF, modular routing, dynamic reasoning, cost?accuracy trade-off, domain specialization
IRE Journals:
Sourav Bera "MRT: Modular Reinforced Transformers ? A Scalable Architecture for Ultra-Fast, Domain-Accurate LLM Systems" Iconic Research And Engineering Journals Volume 9 Issue 7 2026 Page 1509-1513 https://doi.org/10.64388/IREV9I7-1713610
IEEE:
Sourav Bera
"MRT: Modular Reinforced Transformers ? A Scalable Architecture for Ultra-Fast, Domain-Accurate LLM Systems" Iconic Research And Engineering Journals, 9(7) https://doi.org/10.64388/IREV9I7-1713610