Querying Graph Databases at Scale

Aidan Hogan D. Vrgoč

Abstrak

The tutorial provides an in-depth overview of recent advances in algorithms and data structures for processing graph database queries. The focus will be on scalable algorithms that have been demonstrated to work over real world knowledge graphs. We will also present detailed performance comparisons of classical and recent algorithms. The tutorial will be divided into four sections. The first section will motivate the use of graph databases for querying knowledge graphs, and will introduce the attendees to graph data models and the query language landscape. The second section will discuss how to efficiently evaluate graph patterns, introducing the worst-case optimal join techniques and comparing them to classical join algorithms. The third section will discuss techniques for efficiently evaluating path queries and for constructing compact representations of potentially exponential sets of paths. In the final section we will introduce recent advances in compressed data structures that ease the high memory requirements of worst-case optimal join algorithms and also provide a template for evaluating path queries in a highly optimised manner.

Artikel Ilmiah Terkait

Querying in the Age of Graph Databases and Knowledge Graphs

M. Arenas Claudio Gutiérrez Juan Sequeda

9 Juni 2021

Graphs have become the best way we know of representing knowledge. The computing community has investigated and developed the support for managing graphs by means of digital technology. Graph databases and knowledge graphs surface as the most successful solutions to this program. This tutorial will provide a conceptual map of the data management tasks underlying these developments, paying particular attention to data models and query languages for graphs

HyGraph: a subgraph isomorphism algorithm for efficiently querying big graph databases

A. Yazıcı Merve Asiler R. George

21 April 2022

The big graph database provides strong modeling capabilities and efficient querying for complex applications. Subgraph isomorphism which finds exact matches of a query graph in the database efficiently, is a challenging problem. Current subgraph isomorphism approaches mostly are based on the pruning strategy proposed by Ullmann. These techniques have two significant drawbacks- first, they are unable to efficiently handle complex queries, and second, their implementations need the large indexes that require large memory resources. In this paper, we describe a new subgraph isomorphism approach, the HyGraph algorithm, that is efficient both in querying and with memory requirements for index creation. We compare the HyGraph algorithm with two popular existing approaches, GraphQL and Cypher using complexity measures and experimentally using three big graph data sets—(1) a country-level population database, (2) a simulated bank database, and (3) a publicly available World Cup big graph database. It is shown that the HyGraph solution performs significantly better (or equally) than competing algorithms for the query operations on these big databases, making it an excellent candidate for subgraph isomorphism queries in real scenarios.

Worst-Case-Optimal Similarity Joins on Graph Databases

Adrián Gómez-Brandón Diego Arroyuelo Benjamín Bustos + 3 lainnya

12 Maret 2024

We extend the concept of worst-case optimal equijoins in graph databases to the case where some nodes are required to be within the k-nearest neighbors (kNN) of others under some similarity function. We model the problem by superimposing the database graph with the kNN graph and show that a variant of Leapfrog TrieJoin (LTJ) implemented over a compact data structure called the Ring can be seamlessly extended to integrate similarity clauses with the equijoins in the LTJ query process, retaining worst-case optimality in many relevant cases. Our experiments on a benchmark that combines Wikidata and IMGpedia show that our enhanced LTJ algorithm outperforms by a considerable margin a baseline that first applies classic LTJ and then completes the query by applying the similarity predicates. The difference is more pronounced on queries where the similarity clauses are more densely connected to the query, becoming of an order of magnitude in some cases.

The Ring: Worst-case Optimal Joins in Graph Databases using (Almost) No Extra Space

Gonzalo Navarro Adrián Soto Diego Arroyuelo + 4 lainnya

8 Februari 2024

We present an indexing scheme for triple-based graphs that supports join queries in worst-case optimal (wco) time within compact space. This scheme, called a ring, regards each triple as a cyclic string of length 3. Each rotation of the triples is lexicographically sorted and the values of the last attribute are stored as a column, so we obtain the order of the next column by stably re-sorting the triples by its attribute. We show that, by representing the columns with a compact data structure called a wavelet tree, this ordering enables forward and backward navigation between columns without needing pointers. These wavelet trees further support wco join algorithms and cardinality estimations for query planning. While traditional data structures such as B-Trees, tries, and so on, require 6 index orders to support all possible wco joins over triples, we can use one ring to index them all. This ring replaces the graph and uses only sublinear extra space, thus supporting wco joins in almost no space beyond storing the graph itself. Experiments querying a large graph (Wikidata) in memory show that the ring offers nearly the best overall query times while using only a small fraction of the space required by several state-of-the-art approaches. We then turn our attention to some theoretical results for indexing tables of arity d higher than 3 in such a way that supports wco joins. While a single ring of length d no longer suffices to cover all d! orders, we need much fewer rings to index them all: O(2d) rings with a small constant. For example, we need 5 rings instead of 120 orders for d=5. We show that our rings become a particular case of what we dub order graphs, whose nodes are attribute orders and where stably sorting by some attribute leads us from an order to another, thereby inducing an edge labeled by the attribute. The index is then the set of columns associated with the edges, and a set of rings is just one possible graph shape. We show that other shapes, like for example a single ring instead of several ones of length d, can lead us to even smaller indexes, and that other more general shapes are also possible. For example, we handle d=5 attributes within space equivalent to 4 rings.

The Suitability of Graph Databases for Big Data Analysis: A Benchmark

Matus Stovcik Barbora Buhnova M. Mačák

2020

: Digitalization of our society brings various new digital ecosystems (e.g., Smart Cities, Smart Buildings, Smart Mobility), which rely on the collection, storage, and processing of Big Data. One of the recently popular advancements in Big Data storage and processing are the graph databases. A graph database is specialized to handle highly connected data, which can be, for instance, found in the cross-domain setting where various levels of data interconnection take place. Existing works suggest that for data with many relationships, the graph databases perform better than non-graph databases. However, it is not clear where are the borders for speciﬁc query types, for which it is still efﬁcient to use a graph database. In this paper, we design and perform tests that examine these borders. We perform the tests in a cluster of three machines so that we explore the database behavior in Big Data scenarios concerning the query. We speciﬁcally work with Neo4j as a representative of graph databases and PostgreSQL as a representative of non-graph databases.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.