Abstract:
To address the dual requirements of efficiency and professionalism in diabetes-related intelligent question-answering, this study presents DiaRAG, an innovative system that synergistically integrates knowledge graphs with retrieval-augmented generation (RAG) techniques. The proposed system is specifically tailored to the diabetes domain, in which both medical expertise and updated knowledge are critical. DiaRAG introduces an autoprompt generation (APG) method that automatically synthesizes diabetes-specific prompt templates. These templates are used to extract structured information from diabetes literature and clinical data, thus facilitating the construction of a comprehensive diabetes knowledge graph and a dedicated retrieval knowledge base. By applying APG, the system effectively generates candidate prompts that enhanced the extraction of relevant knowledge triples, addressing the challenges posed by ambiguous or complex medical queries and ensuring that the subsequent retrieval process is grounded in an accurate, domain-specific context.
Furthermore, DiaRAG integrates a specialized text correction module based on PL-BART (Prompt Learning and Bidirectional Auto-Regressive Transformers). This module is designed to correct semantic and syntactic errors in patient queries. By leveraging prompt-guided correction, PL-BART improves the clarity of input questions, thus enabling the retrieval module to perform more precise matching with the underlying diabetes knowledge graph.
In the retrieval phase, a fine-tuned re-ranker model is introduced to further optimize the ordering of the candidate community summaries. This re-ranker, built on a cross-encoder architecture that employs BERT, evaluates the relevance of the retrieved documents to the patient’s query. The secondary filtering provided by this module not only enhances the alignment between the query intent and the retrieved content but also mitigates the common issue of hallucinations in large language models (LLMs) by ensuring that only high-quality, domain-relevant information is passed to the generation stage.
Experimental evaluations were conducted on the DaCorp diabetes question-answering dataset, and the results showed that DiaRAG achieved superior performance compared to state-of-the-art models, such as GPT-3.5, HuatuoGPT, and other retrieval-augmented frameworks, such as NaiveRAG and SelfRAG. Key evaluation metrics, including ROUGE-1, ROUGE-2, and ROUGE-L, indicated that DiaRAG consistently outperformed baseline methods in terms of answer accuracy and community summary relevance.
Ablation studies further demonstrated that each component—the APG module, PL-BART-based text correction, and fine-tuned re-ranker —contributed significantly to the overall system performance. Notably, iterative prompt optimization via APG and a specialized re-ranking process have been shown to be critical for handling the intricate and specialized language inherent in diabetes-related queries. In a detailed case study involving patient inquiries about the suitability of a traditional Chinese medicine for diabetic conditions, DiaRAG provided a comprehensive answer that not only considered the general pharmacological properties of the medicine but also incorporated detailed clinical insights. This nuanced explanation, which directly addressed the complexities of diabetic complications and the specific indications of the medicine, resulted in expert evaluations rating DiaRAG’s response significantly higher than those provided by competing models such as GPT-3.5 and HuatuoGPT. The experts praised DiaRAG for its precise and contextually appropriate advice, which ultimately highlighted the system’s potential for delivering personalized and reliable medical guidance.
Overall, DiaRAG represents an important advancement in the design of domain-specific intelligent question-answering systems. Seamlessly integrating structured knowledge extraction, robust text correction, and refined retrieval strategies, it offers an innovative solution for personalized medical knowledge services in diabetes care.