Adaptive Graph Pruning for Multi-Agent Communication

Abstract

Large Language Model (LLM) based multi-agent systems have shown remarkable performance in various tasks, especially when enhanced through collaborative communication. However, current methods often rely on a fixed number of agents and static communication structures, limiting their ability to adapt to varying task complexities. In this paper, we propose Adaptive Graph Pruning (AGP), a novel task-adaptive multi-agent collaboration framework that jointly optimizes agent quantity (hard-pruning) and communication topology (soft-pruning). Specifically, our method employs a two-stage training strategy: firstly, independently training soft-pruning networks for different agent quantities to determine optimal agent-quantity-specific complete graphs and positional masks across specific tasks; and then jointly optimizing hard-pruning and soft-pruning within a maximum complete graph to dynamically configure the number of agents and their communication topologies per task. Extensive experiments demonstrate that our approach is: (1) High-performing, achieving state-of-the-art results across six benchmarks and consistently generalizes across multiple mainstream LLM architectures, with a increase in performance of \(2.58\%\sim 9.84\%\); (2) Task-adaptive, dynamically constructing optimized communication topologies tailored to specific tasks, with an extremely high performance in all three task categories (general reasoning, mathematical reasoning, and code generation); (3) Token-economical, having fewer training steps and token consumption at the same time, with a decrease in token consumption of \(90\%+\); and (4) Training-efficient, achieving high performance with very few training steps compared with other methods. The performance will surpass the existing baselines after about ten steps of training under six benchmarks.

Framework

Adaptive Graph Pruning (termed as AGP) first mines high-utility sub-graphs from a fixed pool of heterogeneous LLM-agents and preserves their edge labels and node masks as supervision. The next section sets up notation, casts these ideas in graph-topological terms, and details the multi-agent communication protocol that AGP learns to instantiate for any new query.

Stage I mines near-optimal sub-graphs from a heterogeneous agent pool.
Stage II trains a joint soft–/hard–pruning network that instantiates an adaptive communication topology for any incoming query \(\mathcal{Q}\).

Dataset Composition

Our supervision corpus from Stage I have both the micro and macro structure. Each labeled pair is a triple:

Task description: a natural-language prompt that an LLM teammust solve (e.g. implement 'flip_case').
Edge-weight matrix: the edge-weight matrix of the max-complete graph
Node mask: the binary node-mask vector to identify agents that actually participate in the task.

The corpus totals 460 supervision graphs, drawn from three domains:

General Reasoning (200 pairs ): commonsense QA, humanities, social sciences; serves to anchor broad language understanding.
Mathematical Reasoning (100 pairs ): arithmetic, algebra, and word-problem reasoning; stresses symbolic manipulation and multi-step deduction.
Code Generation (160 pairs ): implement functions from I/O specifications or docstrings; examines the ability to plan, execute, and self-verify algorithmic tasks.

Experiments

High-performing

AGP delivers the strongest overall accuracy.

table — Performance comparison with single-agent approaches, multi-agent topologies, and **AGP**. The base LLM for all baselines is *gpt-4o-mini*. We **bold** the best results and underline the runner-ups. "Mul." and "Ada." indicate multi-agent support and task adaptivity, respectively. ×, △, and ✓denote no, partial, and full support.

Training-efficient

AGP not only attains higher final accuracy but also achieves baseline-beating performance in fewer than ten optimization steps, evidencing markedly better sample- and compute-efficiency during training.

Under MMLU, GSM8k, and HummanEval benchmarks, the curves of the performance of AGP and G-Designer as the number of training steps increases. Starting from the fifth step, there will be an evaluation after each batch is trained.

Token-economical

AGP can provide more accurate and economical solutions in complex settings without the iterative overhead of full architecture searches.

MMLU

GSM8K

SVAMP

HumanEval

Visualization of the performance and the number of prompt tokens of different multi-agent communication works across MMLU, GSM8K, SVAMP, and HumanEval benchmarks

Demo

General Reasoning

Mathematical Reasoning

Code Generation

BibTeX

If you find AGP helpful in your research, please consider citing:

@article{li2025adaptive,
  title={Adaptive Graph Pruning for Multi-Agent Communication},
  author={Li, Boyi and Zhao, Zhonghan and Lee, Der-Horng and Wang, Gaoang},
  journal={arXiv preprint arXiv:2506.02951},
  year={2025}
}