Empowering Large Language Models to Edge Intelligence: A Survey of Edge Efficient LLMs and Techniques

Author: Rui Wang ¹ Zhiyong Gao ¹ Liuyang Zhang ¹ Shuaibing Yue ¹ Ziyi Gao ¹
Institute:

1. University of Science and Technology Beijing
Correspondent： RuiWang Email:wangrui@ustb.edu.cn
Submit Time:2024-11-25 10:02:34

Abstract: Large language models (LLMs) have showcased exceptional capabilities across various natural language processing (NLP) tasks in recent years, such as machine translation, text summarization, and question answering. Despite their impressive performance, the deployment of these models on edge devices, such as mobile phones, IoT devices, and edge computing nodes, is significantly hindered by their substantial computational and memory requirements. This survey provides a comprehensive overview of the state-of-the-art techniques and strategies for enabling efficient inference of LLMs on edge devices. We explore approaches including the development of small language models (SLMs), model compression techniques, inference optimization strategies, and dedicated frameworks for edge deployment. Our goal is to highlight the advancements and ongoing challenges in this field, offering valuable insights for researchers and practitioners striving to bring the power of LLMs to edge environments.

Large Language Model Edge Intelligence Small Language Model Model Compression Efficient Inference On-device LLM

From: 王睿
Subject: Computer Science >> Natural Language Understanding and Machine Translation
Contribution： Journal Submitted
Cite as: ChinaXiv:202411.00258 (or this version ChinaXiv:202411.00258V1)
DOI:10.12074/202411.00258
CSTR:32003.36.ChinaXiv.202411.00258
TXID： a436ec51-cd78-4432-bc14-e51c0ba61b2d
Recommended references： RuiWang,ZhiyongGao,LiuyangZhang,ShuaibingYue,ZiyiGao.Empowering Large Language Models to Edge Intelligence: A Survey of Edge Efficient LLMs and Techniques.null.[DOI:10.12074/202411.00258] (Click&Copy)

Version History

[V1]

2024-11-25 10:02:34

ChinaXiv:202411.00258V1

Download

Related Paper

1. Unraveling the Black-box Magic: An Analysis of Neural Networks’ Dynamic Local Extrema	2025-07-08
2. “认知审判”：一种针对大型语言模型的心理司法攻击范式	2025-06-18
3. MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning	2025-06-10
4. Semantic structures within natural language and their cognitive functions	2025-06-03
5. Physical models realizing the transformer architecture of large language models	2025-05-27
6. DO-RAG: A Domain-Specific QA Framework Using Knowledge Graph-Enhanced Retrieval-Augmented Generation	2025-05-20
7. Understanding Real-World Vulnerabilities in Distributed Cloud Systems	2025-05-08
8. Mathematical formalism and physical models for generative artificial intelligence	2025-05-07
9. What surface characteristics truly affect thermal contact resistance -- An interpretability study based on deep learning and convolutional neural networks	2025-04-11
10. The Thermal Contact Resistance Dataset and the Artificial Intelligence-Driven Prediction of Thermal Contact Resistance in Multi-material Systems	2025-04-11


Public comments Anonymous comments Send only to author