Home
World Journal of Advanced Research and Reviews
International Journal with High Impact Factor for fast publication of Research and Review articles

Main navigation

  • Home
    • Journal Information
    • Editorial Board Members
    • Reviewer Panel
    • Abstracting and Indexing
    • Journal Policies
    • Our CrossMark Policy
    • Publication Ethics
    • Issue in Progress
    • Current Issue
    • Past Issues
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Join Editorial Board
    • Join Reviewer Panel
  • Contact us
  • Downloads

eISSN: 2581-9615 || CODEN: WJARAI || Impact Factor 8.2 ||  CrossRef DOI

Research and review articles are invited for publication in March 2026 (Volume 29, Issue 3) Submit manuscript

ViMoE: Vision Mixture of Experts with Multimodal Context Awareness

Breadcrumb

  • Home
  • ViMoE: Vision Mixture of Experts with Multimodal Context Awareness

Adele Chinda *

Computer Science, Georgia State University, USA.

Research Article

World Journal of Advanced Research and Reviews, 2026, 29(01), 1886-1901

Article DOI: 10.30574/wjarr.2026.29.1.0242

DOI url: https://doi.org/10.30574/wjarr.2026.29.1.0242

Received on 22 December 2025; revised on 28 January 2026; accepted on 31 January 2026

Multimodal large language models (MLLMs) rely heavily on vision encoders to understand diverse image content. While recent approaches have explored combining multiple vision experts to address the limitations of single encoders, they typically perform image-level expert selection and fusion, ignoring the spatial heterogeneity within images where different regions may benefit from different experts. In this paper, we propose ViMoE (Vision Mixture of Experts with Multimodal Context Awareness), a novel MLLM that introduces three key innovations: (1) Token-Level Sparse Expert Activation (TLSEA) that enables different spatial tokens to utilize different expert combinations, allowing fine-grained, content-aware feature extraction; (2) Hierarchical Context Aggregation (HCA) that captures multi-scale visual context to guide expert routing at different granularities; and (3) Expert Confidence Calibration (ECC) that learns to estimate and calibrate expert contribution confidence to reduce noise from unreliable features. Through these innovations, ViMoE achieves more precise expert utilization by recognizing that a single image often contains diverse content requiring different visual expertise. Extensive experiments demonstrate that ViMoE achieves significant improvements over state-of-the-art methods across challenging multimodal benchmarks including MME, MMBench, and various VQA tasks, while maintaining computational efficiency through sparse activation patterns. Code is available at: https://arrel.github.io/vimoe/ 

Vision Mixture of Experts; Token-level routing; Multimodal large language mode; Hierarchical context aggregation; Confidence calibration; Sparse expert activation

https://wjarr.com/sites/default/files/fulltext_pdf/WJARR-2026-0242.pdf

Preview Article PDF

Adele Chinda. ViMoE: Vision Mixture of Experts with Multimodal Context Awareness. World Journal of Advanced Research and Reviews, 2026, 29(1), 1886-1901. Article DOI: https://doi.org/10.30574/wjarr.2026.29.1.0242

Copyright © Author(s). All rights reserved. This article is published under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the original author(s) and source, a link to the license is provided, and any changes made are indicated.


All statements, opinions, and data contained in this publication are solely those of the individual author(s) and contributor(s). The journal, editors, reviewers, and publisher disclaim any responsibility or liability for the content, including accuracy, completeness, or any consequences arising from its use.

Get Certificates

Get Publication Certificate

Download LoA

Check Corssref DOI details

Issue details

Issue Cover Page

Editorial Board

Table of content

Copyright © 2026 World Journal of Advanced Research and Reviews - All rights reserved

Developed & Designed by VS Infosolution