Boyi Li


B.S. in Computer Engineering @ ZJU - UIUC Institute

🗨 About Me

Hi, I'm Boyi Li. I am pursuing my dual B.S. degree in Computer Engineering at Zhejiang University and University of Illinois Urbana-Champaign.

I'm currently a research intern in the Rehg Lab at UIUC, deeply honored to be mentored by Xu Cao and Prof. Jim Rehg. Previously, I was very fortunate to be a research assistant in CVNext Lab at Zhejiang University, mentored by Zhonghan Zhao and Prof. Gaoang Wang. I am deeply grateful to them for their tremendous help and support in my early research journey and development.

Research Interests:
  • Multimodal LLM, mainly foucus on cognitive and abstract reasoning
  • Multi-Agent Systems, especailly Multi-Agent Communication
  • Computer Vision for Healthcare
📄 Here is my Resume

🌎 Service


📝 Selected Publications:

* Equal contribution. Project lead. corresponding/co-corresponding author.
Also see Publications Page and Google Scholar.

  • See and think: Embodied agent in virtual environment
    Zhonghan Zhao*, Wenhao Chai*,♡, Xuan Wang*, Boyi Li, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang
    [Paper] [Code] [Website]
    A comprehensive and visionary embodied agent in the Minecraft virtual environment comprises three key components: vision perception, language instruction, and code action.

    ECCV 2024

  • Adaptive Graph Pruning for Multi-Agent Communication
    Boyi Li*, Zhonghan Zhao*,♡, Der-Horng Lee, Gaoang Wang
    [Paper] [Code] [Website]
    A novel task-adaptive multi-agent collaboration framework that jointly optimizes agent quantity (hard-pruning) and communication topology (soft-pruning), dynamically constructing optimized communication topologies tailored specifically to individual tasks.

    ECAI 2025

  • Toward Cognitive Supersensing in Multimodal Large Language Model
    Boyi Li*, Yifan Shen*,♡, Yuanzhe Liu*, Yifan Xu, Jiateng Liu, Xinzhuo Li, Zhengyuan Li, Jingyuan Zhu, Yunhan Zhong, Fangzhou Lan, Jianguo Cao, James M. Rehg, Heng Ji, Ismini Lourentzou, Xu Cao
    [Paper] [Code] [Website]
    A novel training paradigm that endows MLLMs with human-like visual imagery capabilities by integrating a Latent Visual Imagery Prediction (LVIP) head that jointly learns sequences of visual cognitive latent embeddings and aligns them with the answer, thereby forming vision-based internal reasoning chains. To evaluate the cognitive capabilities of MLLMs, we present CogSense-Bench, a comprehensive visual question answering (VQA) benchmark assessing five cognitive dimensions.
    Preprint

News:

  • Jul. 2024: Our paper See and think: Embodied agent in virtual environment is accepted by ECCV 2024.
  • Jul. 2025: Our paper Adaptive Graph Pruning for Multi-Agent Communication is accepted by ECAI 2025.
  • Jan. 2026: It is a great honor to serve as one of the organizers of CVPR 2026 Workshop on Computer Vision for Children (CV4CHL).