基于Zero-Shot-CoT的对话价值观优先级标注方法

Method for annotating dialogue value priority based on Zero-Shot Chain-of-Thought

  • 摘要: 价值观优先级识别旨在识别文本背后隐含的价值观优先级属性,从而判断其是否与特定的价值观及其类型相符,对于用户语言检测、评估大语言模型生成内容和探究大语言模型对人类价值观优先级的评估能力至关重要. 目前,由于缺乏对话场景下的人类价值观识别数据集,在对话中建模并识别人类价值观优先级的研究仍未被触及. 因此,构建高质量的对话价值观优先级识别数据集是首要任务. 然而,标注对话价值观优先级识别数据集要求标注者具备一定专业知识储备,标注门槛较高,因此,本文基于大语言模型对现有的对话语料进行标注,提供了一个对话价值观优先级识别数据集的标注案例,扩展了基于大语言模型的数据标注的应用. 具体来说,设计了一种基于Zero-Shot-CoT的对话价值观标注方法,模拟了人类标注结果,并通过本文提出的对话价值观优先级标注方法,构建了一个大规模对话价值观识别数据集ValueCon. 实验结果表明,与人工标注方法相比,本文提出的标注方法缓解了人工标注带来的不一致性和噪声影响,基于此构建的ValueCon数据集能够有效训练对话价值观识别模型,验证了本文提出的标注方法具有实用价值.

     

    Abstract: Value priority recognition is a fundamental task in computational linguistics that focuses on discerning and categorizing the implicit hierarchical structure of human values manifested within textual expressions. Its core objective is to determine whether textual content aligns with specific value types and identify the relative precedence assigned to these values within the implicit hierarchy. This capability holds profound significance across several critical domains. It is indispensable for conducting sophisticated analysis of user language patterns in behavioral profiling. It serves as an essential metric for evaluating the ethical alignment and value consistency inherent in content generated by large language models. It establishes a vital methodological foundation for investigating the capacity of large language models to comprehend, interpret, and evaluate the complex hierarchies inherent in human value systems. Dialogue, which is a primary and natural mode of human communication, inherently functions as a potent vehicle for expressing value-driven judgments, preferences, and priorities. The interactive nature of conversations involving turn-taking, argumentation, and negotiation frequently reveals implicit value tradeoffs and hierarchical relationships. Consequently, dialogue presents an exceptionally fertile domain for modeling human value prioritization. Despite this inherent suitability, dedicated research that focuses on systematically modeling human value priorities within interactive conversational settings remains underdeveloped and largely unexplored. This significant research gap stems primarily from the current absence of dedicated high-quality datasets specifically designed for recognizing value priorities within authentic dialogue contexts. The lack of such resources substantially hinders empirical investigations and development of effective computational models in this field. Therefore, the creation of a meticulously annotated large-scale dataset for dialogue value priority recognition emerges as an essential foundational prerequisite for advancing scholarly understanding in this area. However, the annotation process required for constructing such specialized datasets encounters substantial and intrinsic challenges. These difficulties arise principally from the complex cognitive nature and profound subjectivity that characterize human values. Values represent deeply held, often abstract, cognitive constructs that fundamentally guide decision making and behavior. Their reliable identification and hierarchical ordering within textual data necessitate more than superficial linguistic analysis; they demand interpretative insight into the underlying motivations, ethical frameworks, and contextual nuances. This cognitive dimension imposes rigorous requirements on human annotators, who must possess substantial expertise in relevant psychological theories, principles of cognitive science pertaining to moral reasoning, and sociolinguistic understanding. Consequently, achieving consistent, reliable, and expert-level manual annotation is a prohibitively high barrier. This challenge inevitably leads to persistent issues including inconsistency among different annotators, conceptual ambiguity in label application, and substantial noise within the resulting annotations—factors that can critically compromise dataset quality and the subsequent performance of models trained upon it. To address these challenges, this study strategically leveraged the advanced capabilities of contemporary large language models. We propose a novel annotation method for dialogue value priority recognition using existing textual dialogue corpora by capitalizing on their exceptional natural language understanding, sophisticated reasoning abilities, and extensive internalized knowledge bases in psychology, ethics, philosophy, and social sciences,. Based on this method, we constructed the ValueCon dataset, a large-scale, high-quality benchmark dataset specifically designed for value priority recognition in dialogue. The experimental results demonstrate that compared with manual annotation methods, the annotation method proposed in this study alleviated the inconsistencies and noise associated with manual annotation. The ValueCon dataset constructed based on this method can effectively train dialogue value recognition models, thereby validating the practical value of the annotation method proposed in this study.

     

/

返回文章
返回