Toward Implementing an Agent-based Distributed Graph Database System
Abstrak
Graph database (DB) systems are increasing their popularity in big-data analysis and machine learning particularly in the areas of e-commerce recommendation, fraud detection, and social-media analytics. Speed-up and spatial scalability of their DB transactions are pursued with various techniques such as index-free access to graph components in Neo4j, graph sharding over a cluster system in ArangoDB, and graph DB construction over distributed memory in AnzoGraph. However, these techniques have their respective challenges: difficulty in expanding an index-free graph over distributed memory, slow-down in accessing distributed disks, and a bottleneck incurred by repetitive master-to-worker distributions of query pipelines.As a solution to these problems, we are applying multi-agent technologies to distributed graph DB construction: multiple user processes over a cluster system maintain portion of distributed graph in their cache space; their cache contents are synchronized through a software-snooped write-back and write-update protocol; and a DB user from any cluster node dispatches an agent that handles an independent graph query through navigating over distributed graph. To follow current trends in graph DB standardization, we adopt the Cypher language whose queries are translated into agent code.This paper presents a new distributed hash-map implementation and its application to our graph DB system; differentiates it from Hazelcast from the viewpoints of its memory coherency and access speed; describes our translator that generates agent code from Cypher queries; and examines graph DB creation and traversal performance of agents in comparison with Neo4j and ArangoDB.
Artikel Ilmiah Terkait
Robert Pavliš
20 Mei 2024
As the global volume of data continues to rise at an unprecedented rate, the challenges of storing and analyzing data are becoming more and more highlighted. This is especially apparent when the data are heavily interconnected. The traditional methods of storing and analyzing data such as relational databases often encounter difficulties when dealing with large amounts of data and this is even more pronounced when the data exhibits intricate interconnections. This paper examines graph databases as an alternative to relational databases in an interconnected Big Data environment. It will also show the theoretical basis behind graph databases and how they outperform relational databases in such an environment, but also why they are better suited for this kind of environment than other NoSQL alternatives. A state of the art in graph databases and how they compare to relational databases in various scenarios will also be presented in this paper.
Michael J. Carey Glenn Galvizo
13 Mei 2024
The increasing prevalence of large graph data has produced a variety of research and applications tailored toward graph data management. Users aiming to perform graph analytics will typically start by importing existing data into a separate graph-purposed storage engine. The cost of maintaining a separate system (e.g., the data copy, the associated queries, etc …) just for graph analytics may be prohibitive for users with Big Data. In this paper, we introduce Graphix and show how it enables property graph views of existing document data in AsterixDB, a Big Data management system boasting a partitioned-parallel query execution engine. We explain a) the graph view user model of Graphix, b) $\text{gSQL}^{++}$, a novel query language extension for synergistic document-based navigational pattern matching, and c) how edge hops are evaluated in a parallel fashion. We then compare queries authored in $\text{gSQL}^{++}$ against versions in other leading query languages. Finally, we evaluate our approach against a leading native graph database, Neo4j, and show that Graphix is appropriate for operational and analytical workloads, especially at scale.
Ioannis Ballas Vassilios Tsakanikas Evaggelos Pefanis + 1 lainnya
20 November 2020
Big Data paradigm has placed pressure on well-established relational databases during the last decade. Both academia and industry have proposed several alternative database schemes in order to model the captured data more efficiently. Among these approaches, graph databases seem the most promising candidate to supplement relational schemes. Within this study, a comparison is performed among Neo4j, one of the leading graph databases, and Apache Spark, a unified engine for distributed large-scale data processing environment, in terms of processing limits. The results reveal that Neo4j is limited to the physical RAM memory of the processing environment. Yet, until this limit is reached, the processing engine of Neo4j outperforms the Apache Spark engine.
Matus Stovcik Barbora Buhnova M. Mačák
2020
: Digitalization of our society brings various new digital ecosystems (e.g., Smart Cities, Smart Buildings, Smart Mobility), which rely on the collection, storage, and processing of Big Data. One of the recently popular advancements in Big Data storage and processing are the graph databases. A graph database is specialized to handle highly connected data, which can be, for instance, found in the cross-domain setting where various levels of data interconnection take place. Existing works suggest that for data with many relationships, the graph databases perform better than non-graph databases. However, it is not clear where are the borders for specific query types, for which it is still efficient to use a graph database. In this paper, we design and perform tests that examine these borders. We perform the tests in a cluster of three machines so that we explore the database behavior in Big Data scenarios concerning the query. We specifically work with Neo4j as a representative of graph databases and PostgreSQL as a representative of non-graph databases.
Han Yang Chao Chen Xin Li + 23 lainnya
1 Agustus 2022
Most products at ByteDance, e.g., TikTok, Douyin, and Toutiao, naturally generate massive amounts of graph data. To efficiently store, query and update massive graph data is challenging for the broad range of products at ByteDance with various performance requirements. We categorize graph workloads at ByteDance into three types: online analytical, transaction, and serving processing, where each workload has its own characteristics. Existing graph databases have different performance bottlenecks in handling these workloads and none can efficiently handle the scale of graphs at ByteDance. We developed ByteGraph to process these graph workloads with high throughput, low latency and high scalability. There are several key designs in ByteGraph that make it efficient for processing our workloads, including edge-trees to store adjacency lists for high parallelism and low memory usage, adaptive optimizations on thread pools and indexes, and geographic replications to achieve fault tolerance and availability. ByteGraph has been in production use for several years and its performance has shown to be robust for processing a wide range of graph workloads at ByteDance.
Daftar Referensi
0 referensiTidak ada referensi ditemukan.
Artikel yang Mensitasi
0 sitasiTidak ada artikel yang mensitasi.