论文链接 / Paper: arXiv:2508.18227

作者 / Authors: Lianming Huang, Haibo Hu, Qiao Li, Xin He, Nan Guan, Chun Jason Xue

中文版

研究动机

视觉语言模型（Vision-Language Models, VLMs）在多模态理解任务中表现出色，但其巨大的 Transformer 参数量导致推理延迟高、部署成本大。如何在保持模型性能的前提下加速推理，是一个关键问题。

核心方法

GM-Skip 提出了一种基于度量引导的 Transformer 块跳过框架，通过策略性地跳过冗余的 Transformer 块来加速 VLM 推理：

1. 度量引导的块选择

使用贪心算法评估每个 Transformer 块的重要性，通过衡量移除某个块对任务特定指标（如准确率、CIDEr 分数）的影响，逐步识别并移除影响最小的块。

2. 逆序删除策略

优先从网络后层开始删除块，而非从前层开始。这是因为早期层对视觉-语言对齐至关重要，移除它们会导致性能灾难性下降。

3. 可调节的稀疏性-性能权衡

引入分数-稀疏度平衡机制，允许在计算效率和精度保持之间灵活调节，适应不同的部署场景。

实验结果

在 COCO 数据集上，跳过超过 40% 的 Transformer 块的同时，单目标分类准确率从 19.1% 提升到 87.3%
在自动驾驶平台 Autoware.Universe 上实现了高达 45.4% 的延迟降低
为延迟敏感的应用场景（如自动驾驶）提供了切实可行的加速方案

English Version

Motivation

Vision-Language Models (VLMs) excel at multimodal understanding but suffer from high inference latency and deployment costs due to their massive Transformer parameters. Accelerating inference while preserving model performance is a critical challenge.

Key Methods

GM-Skip proposes a metric-guided Transformer block skipping framework that strategically skips redundant blocks to accelerate VLM inference:

1. Metric-Guided Block Selection

A greedy algorithm evaluates each block’s importance by measuring the impact of its removal on task-specific metrics (accuracy, CIDEr, etc.), progressively identifying and removing the least impactful blocks.

2. Reverse-Order Deletion Strategy

Blocks are deleted starting from later layers rather than early ones. Early foundational blocks are critical for vision-language alignment, and their removal causes catastrophic performance collapse.

3. Tunable Sparsity-Performance Trade-off

A score-sparsity balance mechanism enables flexible control between computational efficiency and accuracy, accommodating diverse deployment scenarios.

Results

On COCO, single-object classification accuracy improved from 19.1% to 87.3% while skipping over 40% of Transformer blocks
Up to 45.4% latency reduction on Autoware.Universe for autonomous driving
Practical acceleration for latency-sensitive applications