Graphix: “One User's JSON is Another User's Graph”

Michael J. Carey Glenn Galvizo

Abstrak

The increasing prevalence of large graph data has produced a variety of research and applications tailored toward graph data management. Users aiming to perform graph analytics will typically start by importing existing data into a separate graph-purposed storage engine. The cost of maintaining a separate system (e.g., the data copy, the associated queries, etc …) just for graph analytics may be prohibitive for users with Big Data. In this paper, we introduce Graphix and show how it enables property graph views of existing document data in AsterixDB, a Big Data management system boasting a partitioned-parallel query execution engine. We explain a) the graph view user model of Graphix, b) $\text{gSQL}^{++}$, a novel query language extension for synergistic document-based navigational pattern matching, and c) how edge hops are evaluated in a parallel fashion. We then compare queries authored in $\text{gSQL}^{++}$ against versions in other leading query languages. Finally, we evaluate our approach against a leading native graph database, Neo4j, and show that Graphix is appropriate for operational and analytical workloads, especially at scale.

Artikel Ilmiah Terkait

The Suitability of Graph Databases for Big Data Analysis: A Benchmark

Matus Stovcik Barbora Buhnova M. Mačák

2020

: Digitalization of our society brings various new digital ecosystems (e.g., Smart Cities, Smart Buildings, Smart Mobility), which rely on the collection, storage, and processing of Big Data. One of the recently popular advancements in Big Data storage and processing are the graph databases. A graph database is specialized to handle highly connected data, which can be, for instance, found in the cross-domain setting where various levels of data interconnection take place. Existing works suggest that for data with many relationships, the graph databases perform better than non-graph databases. However, it is not clear where are the borders for speciﬁc query types, for which it is still efﬁcient to use a graph database. In this paper, we design and perform tests that examine these borders. We perform the tests in a cluster of three machines so that we explore the database behavior in Big Data scenarios concerning the query. We speciﬁcally work with Neo4j as a representative of graph databases and PostgreSQL as a representative of non-graph databases.

Aion: Efficient Temporal Graph Data Management

James Clarkson Georgios Theodorakis Jim Webber

2024

Modern graph database management systems (DBMSs) can process highly dynamic labeled property graphs (LPGs) with many billions of relationships comfortably, but those systems often ignore the temporal dimension of data, how a graph evolved over time. Temporal analytics allow users to query and compute over the graph throughout its history so that valuable line-of-business data is always accessible and never lost. However, existing approaches tend to be ad-hoc and vary in performance depending on the size of the effective graph workload, such as local pattern matching or global graph algorithms. In this work, we describe Aion, a transactional temporal graph DBMS that generalizes previous approaches for LPGs. Aion extends Neo4j, a modern graph DBMS, incurring minimal performance overhead by decoupling the graph’s history from the latest graph version. To support efficient temporal analytics independently of workload characteristics, Aion adopts a hybrid temporal storage approach: (i) for fast full graph restoration at arbitrary time points, it uses TimeStore that indexes updates by time; (ii) for fine-grained graph history accesses, it uses LineageStore that indexes updates by entity identifiers. To enable incremental graph computations for improved latency, Aion introduces a compute-efficient in-memory LPG representation. Our experiments show that Aion achieves comparable or better performance versus existing non-transactional temporal systems and provides up to an order of magnitude speedup over classic Neo4j.

Towards View Management in Graph Databases

Mohanna Shahrad Yunjia Zheng Yu Ting Gu + 1 lainnya

13 Mei 2024

Views are widely used in relational databases to facilitate query writing, give individualized abstractions to different user groups, and improve query execution time with materialization techniques. This paper explores how views could be defined and used in graph database systems (GDBS) with a similar purpose to what can be found in relational systems. We perform our analysis using Neo4j and its query language Cypher which has many of the features typically found in graph query languages, aiming to pave the way for integrating view management into a wider range of GDBS.

TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs

Arun Ramasami Songting Chen Zhifang Zeng + 6 lainnya

20 Januari 2025

In this paper, we introduce TigerVector, a system that integrates vector search and graph query within TigerGraph, a Massively Parallel Processing (MPP) native graph database. We extend the vertex attribute type with the embedding type. To support fast vector search, we devise an MPP index framework that interoperates efficiently with the graph engine. The graph query language GSQL is enhanced to support vector type expressions and enable query compositions between vector search results and graph query blocks. These advancements elevate the expressive power and analytical capabilities of graph databases, enabling seamless fusion of unstructured and structured data in ways previously unattainable. Through extensive experiments, we demonstrate TigerVector's hybrid search capability, scalability, and superior performance compared to other graph databases (including Neo4j and Amazon Neptune) and a highly optimized specialized vector database (Milvus). TigerVector was integrated into TigerGraph v4.2, the latest release of TigerGraph, in December 2024.

MV4PG: Materialized Views for Property Graphs

Shipeng Qi Kaiwei Li Xingdi Wei + 5 lainnya

28 November 2024

Graph databases are getting more and more attention in the highly interconnected data domain, and the demand for efficient querying of big data is increasing. We noticed that there are duplicate patterns in graph database queries, and the results of these patterns can be stored as materialized views first, which can speed up the query rate. So we propose materialized views on property graphs, including three parts: view creation, view maintenance, and query optimization using views, and we propose for the first time an efficient templated view maintenance method for containing variable-length edges, which can be applied to multiple graph databases. In order to verify the effect of materialized views, we prototype on TuGraph and experiment on both TuGraph and Neo4j. The experiment results show that our query optimization on read statements is much higher than the additional view maintenance cost brought by write statements. The speedup ratio of the whole workload reaches up to 28.71x, and the speedup ratio of a single query reaches up to nearly 100x.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.