Query cost estimation in graph databases via emphasizing query dependencies by using a neural reasoning network
Abstrak
With the increasing complexity of graph queries, query cost estimation has become a key challenge in graph databases. Accurate estimation results are critical for database administrators or database management systems to perform query processing or optimization tasks. An efficient and accurate estimation model can improve the estimation quality and make the produced results credible. Although learning‐based methods have been applied in query cost estimation, most of them are directed at relational queries and cannot be directly used for graph queries. Furthermore, most estimation approaches focus on the correlations between predicates or columns. The dependencies between query schema and query filter conditions and the correlation between query schema are ignored. In this study, we construct a novel deep learning model composed of reasoning and retrieval processes that can accurately capture the potential logical relationships in graph queries. This solves the above problems to some extent. In addition, we propose a query estimation framework that divides the estimation task into query workload generation, training data collection, feature extraction and encoding, and estimation model construction. The results of the experiment on real‐world datasets show that our estimation model can improve the estimation quality and outperforms other compared deep learning models in terms of estimation accuracy.
Artikel Ilmiah Terkait
Nikita Severin Ilya Makarov Olga Gerasimova
2023
The problem of query answering over incomplete attributed graph data is a challenging field of database management systems and artificial intelligence. When there are rules on data structure expressed in the form of the ontology, the theoretical complexity of finding exact solution satisfying ontology constraints increases. Logic-based methods use theoretical constructions to obtain efficient rewritings of the original queries with respect to ontology and find an answer to the rewriting query over incomplete data. However, there is an opportunity to use faster machine learning methods to label all the data and query over the “most probable” data model without taking into account the ontology. This research paper investigates the effectiveness and trustworthiness of both mentioned approaches for answering ontology-mediated queries on graph databases that integrate an ontology with a covering axiom, which states that every node belongs to either of two classes. The first approach involves finding precise answers through logical reasoning and rewriting the problem into a datalog program, while the second approach employs a trained graph neural network to label data in a binary classification problem and leverages SQL for query answering. We conduct an in-depth analysis of the time performance of these approaches and evaluate the impact of training set selection on their ability of correct query answering. By comparing these approaches across various experiments, we provide insights into their strengths and limitations for answering ontology-mediated queries containing a Boolean conjunctive query. In particular, we showed the importance of logic-based approaches for ontology with a covering axiom and the inability of machine learning methods to find answers for ontology-mediated queries in large networks.
Alejandro Dobles Zecheng Zhang J. Leskovec + 9 lainnya
29 Juli 2024
We present RelBench, a public benchmark for solving predictive tasks over relational databases with graph neural networks. RelBench provides databases and tasks spanning diverse domains and scales, and is intended to be a foundational infrastructure for future research. We use RelBench to conduct the first comprehensive study of Relational Deep Learning (RDL) (Fey et al., 2024), which combines graph neural network predictive models with (deep) tabular models that extract initial entity-level representations from raw tables. End-to-end learned RDL models fully exploit the predictive signal encoded in primary-foreign key links, marking a significant shift away from the dominant paradigm of manual feature engineering combined with tabular models. To thoroughly evaluate RDL against this prior gold-standard, we conduct an in-depth user study where an experienced data scientist manually engineers features for each task. In this study, RDL learns better models whilst reducing human work needed by more than an order of magnitude. This demonstrates the power of deep learning for solving predictive tasks over relational databases, opening up many new research opportunities enabled by RelBench.
Milan Cvitkovic
6 Februari 2020
The majority of data scientists and machine learning practitioners use relational data in their work [State of ML and Data Science 2017, Kaggle, Inc.]. But training machine learning models on data stored in relational databases requires significant data extraction and feature engineering efforts. These efforts are not only costly, but they also destroy potentially important relational structure in the data. We introduce a method that uses Graph Neural Networks to overcome these challenges. Our proposed method outperforms state-of-the-art automatic feature engineering methods on two out of three datasets.
Lei Chen Tianshi ZHENG Hang Yin + 16 lainnya
24 Januari 2025
Graph databases (GDBs) like Neo4j and TigerGraph excel at handling interconnected data but lack advanced inference capabilities. Neural Graph Databases (NGDBs) address this by integrating Graph Neural Networks (GNNs) for predictive analysis and reasoning over incomplete or noisy data. However, NGDBs rely on predefined queries and lack autonomy and adaptability. This paper introduces Agentic Neural Graph Databases (Agentic NGDBs), which extend NGDBs with three core functionalities: autonomous query construction, neural query execution, and continuous learning. We identify ten key challenges in realizing Agentic NGDBs: semantic unit representation, abductive reasoning, scalable query execution, and integration with foundation models like large language models (LLMs). By addressing these challenges, Agentic NGDBs can enable intelligent, self-improving systems for modern data-driven applications, paving the way for adaptable and autonomous data management solutions.
Mikhail Galkin Hongyu Ren J. Leskovec + 2 lainnya
26 Maret 2023
Complex logical query answering (CLQA) is a recently emerged task of graph machine learning that goes beyond simple one-hop link prediction and solves a far more complex task of multi-hop logical reasoning over massive, potentially incomplete graphs in a latent space. The task received a significant traction in the community; numerous works expanded the field along theoretical and practical axes to tackle different types of complex queries and graph modalities with efficient systems. In this paper, we provide a holistic survey of CLQA with a detailed taxonomy studying the field from multiple angles, including graph types (modality, reasoning domain, background semantics), modeling aspects (encoder, processor, decoder), supported queries (operators, patterns, projected variables), datasets, evaluation metrics, and applications. Refining the CLQA task, we introduce the concept of Neural Graph Databases (NGDBs). Extending the idea of graph databases (graph DBs), NGDB consists of a Neural Graph Storage and a Neural Graph Engine. Inside Neural Graph Storage, we design a graph store, a feature store, and further embed information in a latent embedding store using an encoder. Given a query, Neural Query Engine learns how to perform query planning and execution in order to efficiently retrieve the correct results by interacting with the Neural Graph Storage. Compared with traditional graph DBs, NGDBs allow for a flexible and unified modeling of features in diverse modalities using the embedding store. Moreover, when the graph is incomplete, they can provide robust retrieval of answers which a normal graph DB cannot recover. Finally, we point out promising directions, unsolved problems and applications of NGDB for future research.
Daftar Referensi
0 referensiTidak ada referensi ditemukan.
Artikel yang Mensitasi
0 sitasiTidak ada artikel yang mensitasi.