大语言模型研究现状与趋势

王耀祖; 李擎; 戴张杰; 徐越

doi:10.13374/j.issn2095-9389.2023.10.09.003

摘要: 在过去20年中，语言建模（Language models，LM）已经成为一种主要方法，用于语言理解和生成，同时作为自然语言处理（Natural language processing，NLP）领域下游的关键技术受到广泛关注. 近年来，大语言模型（Large language models，LLMs），例如ChatGPT等技术，取得了显著进展，对人工智能乃至其他领域的变革和发展产生了深远的影响. 鉴于LLMs迅猛的发展，本文首先对LLMs相关技术架构和模型规模等方面的演进历程进行了全面综述，总结了模型训练方法、优化技术以及评估手段. 随后，分析了LLMs在教育、医疗、金融、工业等领域的应用现状，同时讨论了它们的优势和局限性. 此外，还探讨了大语言模型针对社会伦理、隐私和安全等方面引发的安全性与一致性问题及技术措施. 最后，展望了大语言模型未来的研究趋势，包括模型的规模与效能、多模态处理、社会影响等方面的发展方向. 本文通过全面分析当前研究状况和未来走向，旨在为研究者提供关于大语言模型的深刻见解和启发，以推动该领域的进一步发展.

Abstract: Over the past two decades, language modeling (LM) has emerged as a primary methodology for language understanding and generation. This technology has become a cornerstone within the field of natural language processing (NLP). At its core, LM is designed to train models to predict the probability of the next word or token, thereby generating natural and fluent language. The advent of large language models (LLMs), such as Bidirectional Encoder Representations from Transformers and GPT-3, marks a significant milestone in the evolution of LM. These LLMs have left a profound impact on the field of artificial intelligence (AI) while also paving the way for advancements in other domains. This progression underscores the power and efficacy of AI, illustrating how the landscape of AI research has been reshaped by the rapid advancement of LLMs. This paper provides a comprehensive review of the evolution of LLMs, focusing on the technical architecture, model scale, training methods, optimization techniques, and evaluation metrics. Language models have evolved significantly over time, starting from initial statistical language models, moving onto neural network-based models, and now embracing the era of advanced pre-trained language models. As the scale of these models has expanded, so has their performance in language understanding and generation. This has led to notable results across various sectors, including education, healthcare, finance, and industry. However, the application of LLMs also presents certain challenges, such as data quality, model generalization capabilities, and computational resources. This paper delves into these issues, providing an analysis of the strengths and limitations of LLMs. Furthermore, the rise of LLMs has sparked a series of ethical, privacy, and security concerns. For instance, LLMs may generate discriminatory, false, or misleading information, infringe on personal privacy, or even be exploited for malicious activities such as cyber-attacks. To tackle these issues, this paper explores relevant technical measures, such as model interpretability, privacy protection, and security assessments. Ultimately, the paper outlines potential future research trends of LLMs. With ongoing enhancements to model scale and efficiency, LLMs are expected to play an even greater role in multimodal processing and societal impact. For example, by integrating information from different modalities, such as images and sound, LLMs can better understand and generate language. Additionally, they can be employed for societal impact assessment, providing support for policy formulation and decision-making. By thoroughly analyzing the current state of research and potential future directions, this paper aims to offer researchers valuable insights and inspiration regarding LLMs, thereby fostering further advancement in the field.

大语言模型研究现状与趋势

Current status and trends in large language modeling research