Wind-Bell Index: Towards Ultra-Fast Edge Query for Graph Databases

Haoyu Li Yisen Hong Rui Qiu + 2 penulis

Abstrak

Graphs are good at presenting relational and structural information, making it powerful in the representation of various data. For the efficient storage and processing of graph-like data, graph databases have been rapidly developed and extensively studied. However, graph databases mostly use adjacency lists as their basic data structure (e.g., Neo4j), which could result in poor performance of edge due to the skewed degree distribution of graphs.We design the Wind-Bell Index to address this problem. Wind-Bell Index is a memory-efficient index data structure, which can be attached to existing graph databases to speed up the edge. We have fully implemented our data structure in Neo4j, the most popular graph database today, and conduct theoretical and experimental analysis to evaluate the performance. Theoretical results prove the high query efficiency of our algorithm. And experimental results show that the average edge query speed is increased by hundreds of times compared with the original query interface of Neo4j. We believe that the excellent performance and scalability of Wind-Bell Index make it suitable for the application in a variety of graph databases.

Artikel Ilmiah Terkait

An Empirical Evaluation of Variable-length Record B+Trees on a Modern Graph Database System

James Clarkson Jim Webber Georgios Theodorakis

13 Mei 2024

B+Trees are widely used as persistent index implementations for databases. They are often implemented in a way that allows the index to be in main memory while the indexed data remains on disk. Over the years, multiple optimization techniques have been proposed to improve the efficiency of B+Trees by accelerating the key search within a node or compressing data based on common prefixes. This paper describes our empirical research implementing such optimized B+Trees in Neo4j, a modern graph database management system (DBMS). We were able to confirm that the optimized versions lived up to their performance claims over plain B+Trees when benchmarked in isolation. However, we also found that incorporating them into a real DBMS yields marginal improvements only. This is partly because Neo4j is not index-heavy, typically only using indexes to find starting points for graph traversals. The other part is that integrating optimized indexes into the transactions and page-based storage components of Neo4j incurs a performance penalty (for reasons of crash-tolerance) compared to the standalone implementations. Given the additional implementation and maintenance complexity of optimized B+Trees, our research suggests that regular B+Trees remain the preferred general-purpose implementation.

The Ring: Worst-case Optimal Joins in Graph Databases using (Almost) No Extra Space

Gonzalo Navarro Adrián Soto Diego Arroyuelo + 4 lainnya

8 Februari 2024

We present an indexing scheme for triple-based graphs that supports join queries in worst-case optimal (wco) time within compact space. This scheme, called a ring, regards each triple as a cyclic string of length 3. Each rotation of the triples is lexicographically sorted and the values of the last attribute are stored as a column, so we obtain the order of the next column by stably re-sorting the triples by its attribute. We show that, by representing the columns with a compact data structure called a wavelet tree, this ordering enables forward and backward navigation between columns without needing pointers. These wavelet trees further support wco join algorithms and cardinality estimations for query planning. While traditional data structures such as B-Trees, tries, and so on, require 6 index orders to support all possible wco joins over triples, we can use one ring to index them all. This ring replaces the graph and uses only sublinear extra space, thus supporting wco joins in almost no space beyond storing the graph itself. Experiments querying a large graph (Wikidata) in memory show that the ring offers nearly the best overall query times while using only a small fraction of the space required by several state-of-the-art approaches. We then turn our attention to some theoretical results for indexing tables of arity d higher than 3 in such a way that supports wco joins. While a single ring of length d no longer suffices to cover all d! orders, we need much fewer rings to index them all: O(2d) rings with a small constant. For example, we need 5 rings instead of 120 orders for d=5. We show that our rings become a particular case of what we dub order graphs, whose nodes are attribute orders and where stably sorting by some attribute leads us from an order to another, thereby inducing an edge labeled by the attribute. The index is then the set of columns associated with the edges, and a set of rings is just one possible graph shape. We show that other shapes, like for example a single ring instead of several ones of length d, can lead us to even smaller indexes, and that other more general shapes are also possible. For example, we handle d=5 attributes within space equivalent to 4 rings.

TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs

Arun Ramasami Songting Chen Zhifang Zeng + 6 lainnya

20 Januari 2025

In this paper, we introduce TigerVector, a system that integrates vector search and graph query within TigerGraph, a Massively Parallel Processing (MPP) native graph database. We extend the vertex attribute type with the embedding type. To support fast vector search, we devise an MPP index framework that interoperates efficiently with the graph engine. The graph query language GSQL is enhanced to support vector type expressions and enable query compositions between vector search results and graph query blocks. These advancements elevate the expressive power and analytical capabilities of graph databases, enabling seamless fusion of unstructured and structured data in ways previously unattainable. Through extensive experiments, we demonstrate TigerVector's hybrid search capability, scalability, and superior performance compared to other graph databases (including Neo4j and Amazon Neptune) and a highly optimized specialized vector database (Milvus). TigerVector was integrated into TigerGraph v4.2, the latest release of TigerGraph, in December 2024.

The Suitability of Graph Databases for Big Data Analysis: A Benchmark

Matus Stovcik Barbora Buhnova M. Mačák

2020

: Digitalization of our society brings various new digital ecosystems (e.g., Smart Cities, Smart Buildings, Smart Mobility), which rely on the collection, storage, and processing of Big Data. One of the recently popular advancements in Big Data storage and processing are the graph databases. A graph database is specialized to handle highly connected data, which can be, for instance, found in the cross-domain setting where various levels of data interconnection take place. Existing works suggest that for data with many relationships, the graph databases perform better than non-graph databases. However, it is not clear where are the borders for speciﬁc query types, for which it is still efﬁcient to use a graph database. In this paper, we design and perform tests that examine these borders. We perform the tests in a cluster of three machines so that we explore the database behavior in Big Data scenarios concerning the query. We speciﬁcally work with Neo4j as a representative of graph databases and PostgreSQL as a representative of non-graph databases.

Understanding Graph Databases: A Comprehensive Tutorial and Survey

Oluwatosin Agbaakin Sydney Anuyah Victor Bolade

15 November 2024

This tutorial serves as a comprehensive guide for understanding graph databases, focusing on the fundamentals of graph theory while showcasing practical applications across various fields. It starts by introducing foundational concepts and delves into the structure of graphs through nodes and edges, covering different types such as undirected, directed, weighted, and unweighted graphs. Key graph properties, terminologies, and essential algorithms for network analysis are outlined, including Dijkstras shortest path algorithm and methods for calculating node centrality and graph connectivity. The tutorial highlights the advantages of graph databases over traditional relational databases, particularly in efficiently managing complex, interconnected data. It examines leading graph database systems such as Neo4j, Amazon Neptune, and ArangoDB, emphasizing their unique features for handling large datasets. Practical instructions on graph operations using NetworkX and Neo4j are provided, covering node and edge creation, attribute assignment, and advanced queries with Cypher. Additionally, the tutorial explores common graph visualization techniques using tools like Plotly and Neo4j Bloom, which enhance the interpretation and usability of graph data. It also delves into community detection algorithms, including the Louvain method, which facilitates clustering in large networks. Finally, the paper concludes with recommendations for researchers interested in exploring the vast potential of graph technologies.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.