An overview of graph databases and their applications in the biomedical domain
Abstrak
Abstract Over the past couple of decades, the explosion of densely interconnected data has stimulated the research, development and adoption of graph database technologies. From early graph models to more recent native graph databases, the landscape of implementations has evolved to cover enterprise-ready requirements. Because of the interconnected nature of its data, the biomedical domain has been one of the early adopters of graph databases, enabling more natural representation models and better data integration workflows, exploration and analysis facilities. In this work, we survey the literature to explore the evolution, performance and how the most recent graph database solutions are applied in the biomedical domain, compiling a great variety of use cases. With this evidence, we conclude that the available graph database management systems are fit to support data-intensive, integrative applications, targeted at both basic research and exploratory tasks closer to the clinic.
Artikel Ilmiah Terkait
Dagmar Waltemath Irina Balaur Reinhard Schneider + 10 lainnya
23 September 2024
Abstract Graph databases are becoming increasingly popular across scientific disciplines, being highly suitable for storing and connecting complex heterogeneous data. In systems biology, they are used as a backend solution for biological data repositories, ontologies, networks, pathways, and knowledge graph databases. In this review, we analyse all publications using or mentioning graph databases retrieved from PubMed and PubMed Central full-text search, focusing on the top 16 available graph databases, Publications are categorized according to their domain and application, focusing on pathway and network biology and relevant ontologies and tools. We detail different approaches and highlight the advantages of outstanding resources, such as UniProtKB, Disease Ontology, and Reactome, which provide graph-based solutions. We discuss ongoing efforts of the systems biology community to standardize and harmonize knowledge graph creation and the maintenance of integrated resources. Outlining prospects, including the use of graph databases as a way of communication between biological data repositories, we conclude that efficient design, querying, and maintenance of graph databases will be key for knowledge generation in systems biology and other research fields with heterogeneous data.
S. Turhan
15 Desember 2023
Health data plays a pivotal role in modern healthcare, guiding patient care, diagnoses, treatments, and outcomes. This extensive data repository encompasses electronic health records, medical imaging, test reports, and administrative information, empowering healthcare practitioners and researchers to make evidence-based decisions to improve patient well-being. In the complex healthcare landscape, handling health data presents challenges. While relational databases have historically dominated many industries, including healthcare, innovative alternatives like graph databases are gaining favor. Due to its complex and interconnected nature, healthcare data often loses semantic data integrity when modeled in relational databases. In contrast, graph databases have shown remarkable performance with interconnected data. Consequently, there is a belief that modeling health data as a whole on a graph database would produce excellent results. This preliminary study investigates how graph databases can efficiently manage health data by comparing simple data modeling and query performance. The research utilizes a dataset that is publicly available from a hospital in the United States. The dataset covers multiple areas, including hospital admissions, diagnoses, laboratory results, and prescription information for patients diagnosed with diabetes. Initially, an Entity-Relationship Diagram (ERD) models this two-dimensional tabular dataset and is built on a relational database. Subsequently, the ERD is transformed into a graph database schema and built on a NoSQL graph database system. Both databases are normalized during the modeling process, and they share identical data to ensure consistency in data entry. Following this, varying degrees of complex queries are constructed and enacted using the query languages of both database management systems. The primary results indicate that Neo4j outperforms PostgreSQL in performance, though slight inconsistencies in data entry were noted. It highlights their potential in enhancing healthcare data management for better patient care and outcomes.
K. Schallert D. Walke T. Muth + 4 lainnya
1 Januari 2023
Abstract The increasing amount and complexity of clinical data require an appropriate way of storing and analyzing those data. Traditional approaches use a tabular structure (relational databases) for storing data and thereby complicate storing and retrieving interlinked data from the clinical domain. Graph databases provide a great solution for this by storing data in a graph as nodes (vertices) that are connected by edges (links). The underlying graph structure can be used for the subsequent data analysis (graph learning). Graph learning consists of two parts: graph representation learning and graph analytics. Graph representation learning aims to reduce high-dimensional input graphs to low-dimensional representations. Then, graph analytics uses the obtained representations for analytical tasks like visualization, classification, link prediction and clustering which can be used to solve domain-specific problems. In this survey, we review current state-of-the-art graph database management systems, graph learning algorithms and a variety of graph applications in the clinical domain. Furthermore, we provide a comprehensive use case for a clearer understanding of complex graph learning algorithms. Graphical abstract
Yuanyuan Tian
23 November 2022
Rapidly growing social networks and other graph data have created a high demand for graph technologies in the market. A plethora of graph databases, systems, and solutions have emerged, as a result. On the other hand, graph has long been a well studied area in the database research community. Despite the numerous surveys on various graph research topics, there is a lack of survey on graph technologies from an industry perspective. The purpose of this paper is to provide the research community with an industrial perspective on the graph database landscape, so that graph researcher can better understand the industry trend and the challenges that the industry is facing, and work on solutions to help address these problems.
Luke V Rasmussen Mengjia Kang J. Starren + 1 lainnya
1 Oktober 2024
OBJECTIVE Graph databases for electronic health record (EHR) data have become a useful tool for clinical research in recent years, but there is a lack of published methods to transform relational databases to a graph database schema. We developed a graph model for the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) that can be reused across research institutions. METHODS We created and evaluated four models, representing two different strategies, for converting the standardized clinical and vocabulary tables of OMOP into a property graph model within the Neo4j graph database. Taking the Successful Clinical Response in Pneumonia Therapy (SCRIPT) and Collaborative Resource for Intensive care Translational science, Informatics, Comprehensive Analytics, and Learning (CRITICAL) cohorts as test datasets with different sizes, we compared two of the resulting graph models with respect to database performance including database building time, query complexity, and runtime for both cohorts. RESULTS Utilizing a graph schema that was optimized for storing critical information as topology rather than attributes resulted in a significant improvement in both data creation and querying. The graph database for our larger cohort, CRITICAL, can be built within 1 hour for 134,145 patients, with a total of 749,011,396 nodes and 1,703,560,910 edges. DISCUSSION To our knowledge, this is the first generalized solution to convert the OMOP CDM to a graph-optimized schema. Despite being developed for studies at a single institution, the modeling method can be applied to other OMOP CDM v5.x databases. Our evaluation with the SCRIPT and CRITICAL cohorts and comparison between the current and previous versions show advantages in code simplicity, database building, and query speed. CONCLUSION We developed a method for converting OMOP CDM databases into graph databases. Our experiments revealed that the final model outperformed the initial relational-to-graph transformation in both code simplicity and query efficiency, particularly for complex queries.
Daftar Referensi
0 referensiTidak ada referensi ditemukan.
Artikel yang Mensitasi
0 sitasiTidak ada artikel yang mensitasi.