SM3-Text-to-Query: Synthetic Multi-Model Medical Text-to-Query Benchmark

Sithursan Sivasubramaniam Jonathan Fuerst Kurt Stockinger + 2 penulis

Abstrak

Electronic health records (EHRs) are stored in various database systems with different database models on heterogeneous storage architectures, such as relational databases, document stores, or graph databases. These different database models have a big impact on query complexity and performance. While this has been a known fact in database research, its implications for the growing number of Text-to-Query systems have surprisingly not been investigated so far. In this paper, we present SM3-Text-to-Query, the first multi-model medical Text-to-Query benchmark based on synthetic patient data from Synthea, following the SNOMED-CT taxonomy -- a widely used knowledge graph ontology covering medical terminology. SM3-Text-to-Query provides data representations for relational databases (PostgreSQL), document stores (MongoDB), and graph databases (Neo4j and GraphDB (RDF)), allowing the evaluation across four popular query languages, namely SQL, MQL, Cypher, and SPARQL. We systematically and manually develop 408 template questions, which we augment to construct a benchmark of 10K diverse natural language question/query pairs for these four query languages (40K pairs overall). On our dataset, we evaluate several common in-context-learning (ICL) approaches for a set of representative closed and open-source LLMs. Our evaluation sheds light on the trade-offs between database models and query languages for different ICL strategies and LLMs. Last, SM3-Text-to-Query is easily extendable to additional query languages or real, standard-based patient databases.

Artikel Ilmiah Terkait

Aligning Large Language Models to a Domain-specific Graph Database

Weining Qian Siyuan Wang Yunshi Lan + 4 lainnya

26 Februari 2024

Graph Databases (Graph DB) find extensive application across diverse domains such as finance, social networks, and medicine. Yet, the translation of Natural Language (NL) into the Graph Query Language (GQL), referred to as NL2GQL, poses significant challenges owing to its intricate and specialized nature. Some approaches have sought to utilize Large Language Models (LLMs) to address analogous tasks like text2SQL. Nonetheless, in the realm of NL2GQL tasks tailored to a particular domain, the absence of domain-specific NL-GQL data pairs adds complexity to aligning LLMs with the graph DB. To tackle this challenge, we present a well-defined pipeline. Initially, we utilize ChatGPT to generate NL-GQL data pairs, leveraging the provided graph DB with self-instruction. Subsequently, we employ the generated data to fine-tune LLMs, ensuring alignment between LLMs and the graph DB. Moreover, we find the importance of relevant schema in efficiently generating accurate GQLs. Thus, we introduce a method to extract relevant schema as the input context. We evaluate our method using two carefully constructed datasets derived from graph DBs in the finance and medicine domains, named FinGQL and MediGQL. Experimental results reveal that our approach significantly outperforms a set of baseline methods, with improvements of 5.90 and 6.36 absolute points on EM, and 6.00 and 7.09 absolute points on EX for FinGQL and MediGQL, respectively.

Text2Cypher: Bridging Natural Language and Graph Databases

Jon Besga Makbule Gülçin Özsoy Leila Messallem + 1 lainnya

13 Desember 2024

Knowledge graphs use nodes, relationships, and properties to represent arbitrarily complex data. When stored in a graph database, the Cypher query language enables efficient modeling and querying of knowledge graphs. However, using Cypher requires specialized knowledge, which can present a challenge for non-expert users. Our work Text2Cypher aims to bridge this gap by translating natural language queries into Cypher query language and extending the utility of knowledge graphs to non-technical expert users. While large language models (LLMs) can be used for this purpose, they often struggle to capture complex nuances, resulting in incomplete or incorrect outputs. Fine-tuning LLMs on domain-specific datasets has proven to be a more promising approach, but the limited availability of high-quality, publicly available Text2Cypher datasets makes this challenging. In this work, we show how we combined, cleaned and organized several publicly available datasets into a total of 44,387 instances, enabling effective fine-tuning and evaluation. Models fine-tuned on this dataset showed significant performance gains, with improvements in Google-BLEU and Exact Match scores over baseline models, highlighting the importance of high-quality datasets and fine-tuning in improving Text2Cypher performance.

LLMs in Biomedicine: A study on clinical Named Entity Recognition

Fazlolah Mohaghegh Melika Emami Jiaxin Yang + 4 lainnya

10 April 2024

Large Language Models (LLMs) demonstrate remarkable versatility in various NLP tasks but encounter distinct challenges in biomedical due to the complexities of language and data scarcity. This paper investigates LLMs application in the biomedical domain by exploring strategies to enhance their performance for the NER task. Our study reveals the importance of meticulously designed prompts in the biomedical. Strategic selection of in-context examples yields a marked improvement, offering ~15-20\% increase in F1 score across all benchmark datasets for biomedical few-shot NER. Additionally, our results indicate that integrating external biomedical knowledge via prompting strategies can enhance the proficiency of general-purpose LLMs to meet the specialized needs of biomedical NER. Leveraging a medical knowledge base, our proposed method, DiRAG, inspired by Retrieval-Augmented Generation (RAG), can boost the zero-shot F1 score of LLMs for biomedical NER. Code is released at \url{https://github.com/masoud-monajati/LLM_Bio_NER}

Investigations on using Evidence-Based GraphRag Pipeline using LLM Tailored for Answering USMLE Medical Exam Questions

J. Fiaidhi S. Shankar K. Kushal + 1 lainnya

5 Mei 2025

The integration of evidence-based reasoning with retrieval-augmented generation (GraphRAG) holds great promise for enhancing large language model (LLM) question-answering (QA) capabilities. This research proposes a GraphRAG frame- work that improves the interpretability and reliability of LLM- generated answers in the medical domain. Our approach con- structs a knowledge graph using Neo4j to represent UMLS medical entities and relationships, and complements it with a vector store of textbook embeddings for dense passage retrieval. The system is designed to combine symbolic reasoning and semantic search to produce more context-aware and evidence- grounded responses. As a proof of concept, we evaluate our system on United States Medical Licensing Examination (USMLE)- style questions, which require clinical reasoning across multiple domains. While overall answer accuracy remains comparable to that of an LLM only baseline, our system consistently out- performs in citation fidelity; providing richer, more traceable justifications by explicitly linking answers to graph paths and textbook passages. These findings suggest that even when correctness may vary, graph-informed retrieval improves transparency and auditability, which are critical for high-stakes domains like medicine. Our results motivate further refinement of hybrid GraphRAG systems to enhance both factual accuracy and clinical trustworthiness in QA applications.

Building an intelligent diabetes Q&A system with knowledge graphs and large language models

Zhenkai Qin Hongfeng Zhang Zhidong Zang + 3 lainnya

20 Februari 2025

Introduction This paper introduces an intelligent question-answering system designed to deliver personalized medical information to diabetic patients. By integrating large language models with knowledge graphs, the system aims to provide more accurate and contextually relevant medical guidance, addressing the limitations of traditional healthcare systems in handling complex medical queries. Methods The system combines a Neo4j-based knowledge graph with the Baichuan2-13B and Qwen2.5-7B models. To enhance performance, Low-Rank Adaptation (LoRA) and prompt-based learning techniques are applied. These methods improve the system's semantic understanding and ability to generate high-quality responses. The system's performance is evaluated using entity recognition and intent classification tasks. Results The system achieves 85.91% precision in entity recognition and 88.55% precision in intent classification. The integration of a structured knowledge graph significantly improves the system's accuracy and clinical relevance, enhancing its ability to provide personalized medical responses for diabetes management. Discussion This study demonstrates the effectiveness of integrating large language models with structured knowledge graphs to improve medical question-answering systems. The proposed approach offers a promising framework for advancing diabetes management and other healthcare applications, providing a solid foundation for future personalized healthcare interventions.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.