Xin HE's Website

Share papers, techniques, and wonderful life

arXiv'26 | DySHARP：MoE 通信里一半流量是重复的，让 NVSwitch 帮你去重

3 min read · July 22, 2026

2026 · MoE 分布式训练 NVLink 论文解读
arXiv'26 | Cassandra：不训练、不加显存，把 draft model 从 target 模型的 bit 里抠出来

4 min read · July 22, 2026

2026 · LLM Speculative Decoding 边缘推理论文解读
DAC'26 | SlideFormer 让单张 4090 全参微调 123B 模型

5 min read · July 21, 2026

2026 · LLM 微调 Offloading 论文解读
arXiv'26 | StepAudio 2.5：一个底座三种人格，ASR 模式把 RTF 干到 0.0053

4 min read · July 15, 2026

2026 · LLM ASR MTP 推理加速论文解读
Self-Speculative Decoding 简史：不引入额外模型，用自己给自己打草稿这件事到底能走多远

4 min read · July 15, 2026

2026 · LLM 推理加速 Speculative Decoding Self-Speculative Decoding 综述