TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs
Abstrak
In this paper, we introduce TigerVector, a system that integrates vector search and graph query within TigerGraph, a Massively Parallel Processing (MPP) native graph database. We extend the vertex attribute type with the embedding type. To support fast vector search, we devise an MPP index framework that interoperates efficiently with the graph engine. The graph query language GSQL is enhanced to support vector type expressions and enable query compositions between vector search results and graph query blocks. These advancements elevate the expressive power and analytical capabilities of graph databases, enabling seamless fusion of unstructured and structured data in ways previously unattainable. Through extensive experiments, we demonstrate TigerVector's hybrid search capability, scalability, and superior performance compared to other graph databases (including Neo4j and Amazon Neptune) and a highly optimized specialized vector database (Milvus). TigerVector was integrated into TigerGraph v4.2, the latest release of TigerGraph, in December 2024.
Artikel Ilmiah Terkait
Michael J. Carey Glenn Galvizo
13 Mei 2024
The increasing prevalence of large graph data has produced a variety of research and applications tailored toward graph data management. Users aiming to perform graph analytics will typically start by importing existing data into a separate graph-purposed storage engine. The cost of maintaining a separate system (e.g., the data copy, the associated queries, etc …) just for graph analytics may be prohibitive for users with Big Data. In this paper, we introduce Graphix and show how it enables property graph views of existing document data in AsterixDB, a Big Data management system boasting a partitioned-parallel query execution engine. We explain a) the graph view user model of Graphix, b) $\text{gSQL}^{++}$, a novel query language extension for synergistic document-based navigational pattern matching, and c) how edge hops are evaluated in a parallel fashion. We then compare queries authored in $\text{gSQL}^{++}$ against versions in other leading query languages. Finally, we evaluate our approach against a leading native graph database, Neo4j, and show that Graphix is appropriate for operational and analytical workloads, especially at scale.
Haoyu Li Yisen Hong Rui Qiu + 2 lainnya
1 April 2023
Graphs are good at presenting relational and structural information, making it powerful in the representation of various data. For the efficient storage and processing of graph-like data, graph databases have been rapidly developed and extensively studied. However, graph databases mostly use adjacency lists as their basic data structure (e.g., Neo4j), which could result in poor performance of edge due to the skewed degree distribution of graphs.We design the Wind-Bell Index to address this problem. Wind-Bell Index is a memory-efficient index data structure, which can be attached to existing graph databases to speed up the edge. We have fully implemented our data structure in Neo4j, the most popular graph database today, and conduct theoretical and experimental analysis to evaluate the performance. Theoretical results prove the high query efficiency of our algorithm. And experimental results show that the average edge query speed is increased by hundreds of times compared with the original query interface of Neo4j. We believe that the excellent performance and scalability of Wind-Bell Index make it suitable for the application in a variety of graph databases.
Jingren Zhou Wenyuan Yu Longbin Lai + 4 lainnya
28 Maret 2025
This technical report extends the SIGMOD 2025 paper"A Modular Graph-Native Query Optimization Framework"by providing a comprehensive exposition of GOpt's advanced technical mechanisms, implementation strategies, and extended evaluations. While the original paper introduced GOpt's unified intermediate representation (GIR) and demonstrated its performance benefits, this report delves into the framework's implementation depth: (1) the full specification of GOpt's optimization rules; (2) a systematic treatment of semantic variations (e.g., homomorphism vs. edge-distinct matching) across query languages and their implications for optimization; (3) the design of GOpt's Physical integration interface, enabling seamless integration with transactional (Neo4j) and distributed (GraphScope) backends via engine-specific operator customization; and (4) a detailed analysis of plan transformations for LDBC benchmark queries.
Songlin Wu Zhangyang Peng Weizhi Xu + 7 lainnya
4 Januari 2024
High-dimensional vector similarity search (HVSS) is gaining prominence as a powerful tool for various data science and AI applications. As vector data scales up, in-memory indexes pose a significant challenge due to the substantial increase in main memory requirements. A potential solution involves leveraging disk-based implementation, which stores and searches vector data on high-performance devices like NVMe SSDs. However, implementing HVSS for data segments proves to be intricate in vector databases where a single machine comprises multiple segments for system scalability. In this context, each segment operates with limited memory and disk space, necessitating a delicate balance between accuracy, efficiency, and space cost. Existing disk-based methods fall short as they do not holistically address all these requirements simultaneously. In this paper, we present Starling, an I/O-efficient disk-resident graph index framework that optimizes data layout and search strategy within the segment. It has two primary components: (1) a data layout incorporating an in-memory navigation graph and a reordered disk-based graph with enhanced locality, reducing the search path length and minimizing disk bandwidth wastage; and (2) a block search strategy designed to minimize costly disk I/O operations during vector query execution. Through extensive experiments, we validate the effectiveness, efficiency, and scalability of Starling. On a data segment with 2GB memory and 10GB disk capacity, Starling can accommodate up to 33 million vectors in 128 dimensions, offering HVSS with over 0.9 average precision and top-10 recall rate, and latency under 1 millisecond. The results showcase Starling's superior performance, exhibiting 43.9x higher throughput with 98% lower query latency compared to state-of-the-art methods while maintaining the same level of accuracy.
Ioannis Ballas Vassilios Tsakanikas Evaggelos Pefanis + 1 lainnya
20 November 2020
Big Data paradigm has placed pressure on well-established relational databases during the last decade. Both academia and industry have proposed several alternative database schemes in order to model the captured data more efficiently. Among these approaches, graph databases seem the most promising candidate to supplement relational schemes. Within this study, a comparison is performed among Neo4j, one of the leading graph databases, and Apache Spark, a unified engine for distributed large-scale data processing environment, in terms of processing limits. The results reveal that Neo4j is limited to the physical RAM memory of the processing environment. Yet, until this limit is reached, the processing engine of Neo4j outperforms the Apache Spark engine.
Daftar Referensi
0 referensiTidak ada referensi ditemukan.
Artikel yang Mensitasi
0 sitasiTidak ada artikel yang mensitasi.