Community Detection at scale: A comparison study among Apache Spark and Neo4j
Abstrak
The proliferation of data generation devices, including IoT and edge computing has led to the big data paradigm, which has considerably placed pressure on well-established relational databases during the last decade. Researchers have proposed several alternative database models in order to model the captured data more efficiently. Among these approaches, graph databases seem the most promising candidate to supplement relational schemes. Within this study, a comparison is performed among Neo4j, one of the leading graph databases, and Apache Spark, a unified engine for distributed large-scale data processing environment, in terms of processing limits. More specifically, the two frameworks are compared on their capacity to execute community detection algorithms.
Artikel Ilmiah Terkait
Ioannis Ballas Vassilios Tsakanikas Evaggelos Pefanis + 1 lainnya
20 November 2020
Big Data paradigm has placed pressure on well-established relational databases during the last decade. Both academia and industry have proposed several alternative database schemes in order to model the captured data more efficiently. Among these approaches, graph databases seem the most promising candidate to supplement relational schemes. Within this study, a comparison is performed among Neo4j, one of the leading graph databases, and Apache Spark, a unified engine for distributed large-scale data processing environment, in terms of processing limits. The results reveal that Neo4j is limited to the physical RAM memory of the processing environment. Yet, until this limit is reached, the processing engine of Neo4j outperforms the Apache Spark engine.
Matus Stovcik Barbora Buhnova M. Mačák
2020
: Digitalization of our society brings various new digital ecosystems (e.g., Smart Cities, Smart Buildings, Smart Mobility), which rely on the collection, storage, and processing of Big Data. One of the recently popular advancements in Big Data storage and processing are the graph databases. A graph database is specialized to handle highly connected data, which can be, for instance, found in the cross-domain setting where various levels of data interconnection take place. Existing works suggest that for data with many relationships, the graph databases perform better than non-graph databases. However, it is not clear where are the borders for specific query types, for which it is still efficient to use a graph database. In this paper, we design and perform tests that examine these borders. We perform the tests in a cluster of three machines so that we explore the database behavior in Big Data scenarios concerning the query. We specifically work with Neo4j as a representative of graph databases and PostgreSQL as a representative of non-graph databases.
Oluwatosin Agbaakin Sydney Anuyah Victor Bolade
15 November 2024
This tutorial serves as a comprehensive guide for understanding graph databases, focusing on the fundamentals of graph theory while showcasing practical applications across various fields. It starts by introducing foundational concepts and delves into the structure of graphs through nodes and edges, covering different types such as undirected, directed, weighted, and unweighted graphs. Key graph properties, terminologies, and essential algorithms for network analysis are outlined, including Dijkstras shortest path algorithm and methods for calculating node centrality and graph connectivity. The tutorial highlights the advantages of graph databases over traditional relational databases, particularly in efficiently managing complex, interconnected data. It examines leading graph database systems such as Neo4j, Amazon Neptune, and ArangoDB, emphasizing their unique features for handling large datasets. Practical instructions on graph operations using NetworkX and Neo4j are provided, covering node and edge creation, attribute assignment, and advanced queries with Cypher. Additionally, the tutorial explores common graph visualization techniques using tools like Plotly and Neo4j Bloom, which enhance the interpretation and usability of graph data. It also delves into community detection algorithms, including the Louvain method, which facilitates clustering in large networks. Finally, the paper concludes with recommendations for researchers interested in exploring the vast potential of graph technologies.
Robert Pavliš
20 Mei 2024
As the global volume of data continues to rise at an unprecedented rate, the challenges of storing and analyzing data are becoming more and more highlighted. This is especially apparent when the data are heavily interconnected. The traditional methods of storing and analyzing data such as relational databases often encounter difficulties when dealing with large amounts of data and this is even more pronounced when the data exhibits intricate interconnections. This paper examines graph databases as an alternative to relational databases in an interconnected Big Data environment. It will also show the theoretical basis behind graph databases and how they outperform relational databases in such an environment, but also why they are better suited for this kind of environment than other NoSQL alternatives. A state of the art in graph databases and how they compare to relational databases in various scenarios will also be presented in this paper.
Michelle Dea Munehiro Fukuda Lilian Cao + 1 lainnya
15 Desember 2024
Graph database (DB) systems are increasing their popularity in big-data analysis and machine learning particularly in the areas of e-commerce recommendation, fraud detection, and social-media analytics. Speed-up and spatial scalability of their DB transactions are pursued with various techniques such as index-free access to graph components in Neo4j, graph sharding over a cluster system in ArangoDB, and graph DB construction over distributed memory in AnzoGraph. However, these techniques have their respective challenges: difficulty in expanding an index-free graph over distributed memory, slow-down in accessing distributed disks, and a bottleneck incurred by repetitive master-to-worker distributions of query pipelines.As a solution to these problems, we are applying multi-agent technologies to distributed graph DB construction: multiple user processes over a cluster system maintain portion of distributed graph in their cache space; their cache contents are synchronized through a software-snooped write-back and write-update protocol; and a DB user from any cluster node dispatches an agent that handles an independent graph query through navigating over distributed graph. To follow current trends in graph DB standardization, we adopt the Cypher language whose queries are translated into agent code.This paper presents a new distributed hash-map implementation and its application to our graph DB system; differentiates it from Hazelcast from the viewpoints of its memory coherency and access speed; describes our translator that generates agent code from Cypher queries; and examines graph DB creation and traversal performance of agents in comparison with Neo4j and ArangoDB.
Daftar Referensi
0 referensiTidak ada referensi ditemukan.
Artikel yang Mensitasi
0 sitasiTidak ada artikel yang mensitasi.