Publications

* Equal contribution. Project lead. corresponding/co-corresponding author.
Also see my Google Scholar profile for the most up-to-date list.


2026

  • The 1st AI Children Challenge
    Boyi Li, Yifan Shen, Houze Yang, Xu Cao, Guojun Yun, Li Gao, Turong Chen, Long Xu, Jianguo Cao, Meihuan Huang
    The 1st AI Children Challenge aims to advance real-world applications of computer vision and AI in child healthcare, child education, and pediatrics. The 2026 CV4CHL edition featured the first track in this domain: Children Gait Visual Analysis. The main goal of Children Gait Visual Analysis is the fine-grained analysis of children's gait behaviors from keypoint sequences.
    CVPR 2026 Workshop on Computer Vision for Children

  • Toward Cognitive Supersensing in Multimodal Large Language Model
    Boyi Li*, Yifan Shen*,♡, Yuanzhe Liu*, Yifan Xu, Jiateng Liu, Xinzhuo Li, Zhengyuan Li, Jingyuan Zhu, Yunhan Zhong, Fangzhou Lan, Jianguo Cao, James M. Rehg, Heng Ji, Ismini Lourentzou, Xu Cao
    A novel training paradigm that endows MLLMs with human-like visual imagery capabilities by integrating a Latent Visual Imagery Prediction (LVIP) head that jointly learns sequences of visual cognitive latent embeddings and aligns them with the answer, thereby forming vision-based internal reasoning chains. To evaluate the cognitive capabilities of MLLMs, we present CogSense-Bench, a comprehensive visual question answering (VQA) benchmark assessing five cognitive dimensions.
    arXiv Preprint

2025

  • Adaptive Graph Pruning for Multi-Agent Communication
    Boyi Li*, Zhonghan Zhao*,♡, Der-Horng Lee, Gaoang Wang
    A novel task-adaptive multi-agent collaboration framework that jointly optimizes agent quantity (hard-pruning) and communication topology (soft-pruning), dynamically constructing optimized communication topologies tailored specifically to individual tasks.
    ECAI 2025

2024

  • See and think: Embodied agent in virtual environment
    Zhonghan Zhao*, Wenhao Chai*,♡, Xuan Wang*, Boyi Li, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang
    A comprehensive and visionary embodied agent in the Minecraft virtual environment comprises three key components: vision perception, language instruction, and code action.
    ECCV 2024