Publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- AutoHete: An Automatic and Efficient Heterogeneous Training System for LLMsarXiv preprint arXiv:2503.01890, 2025
- RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model MergingarXiv preprint arXiv:2508.01784, 2025
2024
- arxivFusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive CompressionarXiv preprint arXiv:2410.12707, 2024
- arxivFault-Tolerant Hybrid-Parallel Training at Scale with Reliable and Efficient In-memory Checkpointing2024
2023
2022
2021
2020
- CCGRIDBenchmarking the performance and energy efficiency of AI accelerators for AI trainingIn 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID) Workshop , 2020