DOI: 10.48550/arXiv.2503.22091
Terbit pada 28 Maret 2025 Pada arXiv.org

A Graph-native Optimization Framework for Complex Graph Queries

Jingren Zhou Wenyuan Yu Longbin Lai + 4 penulis

Abstrak

This technical report extends the SIGMOD 2025 paper"A Modular Graph-Native Query Optimization Framework"by providing a comprehensive exposition of GOpt's advanced technical mechanisms, implementation strategies, and extended evaluations. While the original paper introduced GOpt's unified intermediate representation (GIR) and demonstrated its performance benefits, this report delves into the framework's implementation depth: (1) the full specification of GOpt's optimization rules; (2) a systematic treatment of semantic variations (e.g., homomorphism vs. edge-distinct matching) across query languages and their implications for optimization; (3) the design of GOpt's Physical integration interface, enabling seamless integration with transactional (Neo4j) and distributed (GraphScope) backends via engine-specific operator customization; and (4) a detailed analysis of plan transformations for LDBC benchmark queries.

Artikel Ilmiah Terkait

Graphix: “One User's JSON is Another User's Graph”

Michael J. Carey Glenn Galvizo

13 Mei 2024

The increasing prevalence of large graph data has produced a variety of research and applications tailored toward graph data management. Users aiming to perform graph analytics will typically start by importing existing data into a separate graph-purposed storage engine. The cost of maintaining a separate system (e.g., the data copy, the associated queries, etc …) just for graph analytics may be prohibitive for users with Big Data. In this paper, we introduce Graphix and show how it enables property graph views of existing document data in AsterixDB, a Big Data management system boasting a partitioned-parallel query execution engine. We explain a) the graph view user model of Graphix, b) $\text{gSQL}^{++}$, a novel query language extension for synergistic document-based navigational pattern matching, and c) how edge hops are evaluated in a parallel fashion. We then compare queries authored in $\text{gSQL}^{++}$ against versions in other leading query languages. Finally, we evaluate our approach against a leading native graph database, Neo4j, and show that Graphix is appropriate for operational and analytical workloads, especially at scale.

TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs

Arun Ramasami Songting Chen Zhifang Zeng + 6 lainnya

20 Januari 2025

In this paper, we introduce TigerVector, a system that integrates vector search and graph query within TigerGraph, a Massively Parallel Processing (MPP) native graph database. We extend the vertex attribute type with the embedding type. To support fast vector search, we devise an MPP index framework that interoperates efficiently with the graph engine. The graph query language GSQL is enhanced to support vector type expressions and enable query compositions between vector search results and graph query blocks. These advancements elevate the expressive power and analytical capabilities of graph databases, enabling seamless fusion of unstructured and structured data in ways previously unattainable. Through extensive experiments, we demonstrate TigerVector's hybrid search capability, scalability, and superior performance compared to other graph databases (including Neo4j and Amazon Neptune) and a highly optimized specialized vector database (Milvus). TigerVector was integrated into TigerGraph v4.2, the latest release of TigerGraph, in December 2024.

gCBO: A Cost-based Optimizer for Graph Databases

Yue Pang Linglin Yang Lei Zou + 1 lainnya

17 Oktober 2022

Query optimization is an especially challenging problem in graph databases due to its wide plan space and the difficulty in gathering statistics. In this demonstration, we propose a new cost-based query optimizer called gCBO for graph databases and implement it in a specific graph database (i.e., gStore). To tackle the aforementioned challenges, gCBO employs a hybrid plan enumerator based on dynamic programming, cost models that capture the characteristics of different types of joins, and a sampling-based cardinality estimation strategy that gathers the necessary statistics on-the-fly. What is more, to absorb the experience of users, we build an interactive component for gCBO, which allows users to receive the optimized execution plans with detailed information and generate their own plans for execution.

The Suitability of Graph Databases for Big Data Analysis: A Benchmark

Matus Stovcik Barbora Buhnova M. Mačák

2020

: Digitalization of our society brings various new digital ecosystems (e.g., Smart Cities, Smart Buildings, Smart Mobility), which rely on the collection, storage, and processing of Big Data. One of the recently popular advancements in Big Data storage and processing are the graph databases. A graph database is specialized to handle highly connected data, which can be, for instance, found in the cross-domain setting where various levels of data interconnection take place. Existing works suggest that for data with many relationships, the graph databases perform better than non-graph databases. However, it is not clear where are the borders for specific query types, for which it is still efficient to use a graph database. In this paper, we design and perform tests that examine these borders. We perform the tests in a cluster of three machines so that we explore the database behavior in Big Data scenarios concerning the query. We specifically work with Neo4j as a representative of graph databases and PostgreSQL as a representative of non-graph databases.

MV4PG: Materialized Views for Property Graphs

Shipeng Qi Kaiwei Li Xingdi Wei + 5 lainnya

28 November 2024

Graph databases are getting more and more attention in the highly interconnected data domain, and the demand for efficient querying of big data is increasing. We noticed that there are duplicate patterns in graph database queries, and the results of these patterns can be stored as materialized views first, which can speed up the query rate. So we propose materialized views on property graphs, including three parts: view creation, view maintenance, and query optimization using views, and we propose for the first time an efficient templated view maintenance method for containing variable-length edges, which can be applied to multiple graph databases. In order to verify the effect of materialized views, we prototype on TuGraph and experiment on both TuGraph and Neo4j. The experiment results show that our query optimization on read statements is much higher than the additional view maintenance cost brought by write statements. The speedup ratio of the whole workload reaches up to 28.71x, and the speedup ratio of a single query reaches up to nearly 100x.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.