Performance Comparison Analysis of ArangoDB, MySQL, and Neo4j: An Experimental Study of Querying Connected Data
Abstrak
Choosing and developing performant database solutions helps organizations optimize their operational practices and decision-making. Since graph data is becoming more common, it is crucial to develop and use them in big data with complex relationships with high and consistent performance. However, legacy database technologies such as MySQL are tailored to store relational databases and need to perform more complex queries to retrieve graph data. Previous research has dealt with performance aspects such as CPU and memory usage. In contrast, energy usage and temperature of the servers are lacking. Thus, this paper evaluates and compares state-of-the-art graphs and relational databases from the performance aspects to allow a more informed selection of technologies. Graph-based big data applications benefit from informed selection database technologies for data retrieval and analytics problems. The results show that Neo4j performs faster in querying connected data than MySQL and ArangoDB, and energy, CPU, and memory usage performances are reported in this paper.
Artikel Ilmiah Terkait
Matus Stovcik Barbora Buhnova M. Mačák
2020
: Digitalization of our society brings various new digital ecosystems (e.g., Smart Cities, Smart Buildings, Smart Mobility), which rely on the collection, storage, and processing of Big Data. One of the recently popular advancements in Big Data storage and processing are the graph databases. A graph database is specialized to handle highly connected data, which can be, for instance, found in the cross-domain setting where various levels of data interconnection take place. Existing works suggest that for data with many relationships, the graph databases perform better than non-graph databases. However, it is not clear where are the borders for specific query types, for which it is still efficient to use a graph database. In this paper, we design and perform tests that examine these borders. We perform the tests in a cluster of three machines so that we explore the database behavior in Big Data scenarios concerning the query. We specifically work with Neo4j as a representative of graph databases and PostgreSQL as a representative of non-graph databases.
Rahmatian Jayanty Sholichah A. Alamsyah Mahmud Imrona
24 September 2020
Currently, the development of data has increased rapidly, Solutions are needed to be able to manage data efficiently, one that can be offered is to utilize the database. The biggest decision in selecting a database is to select between SQL or NoSQL. MySQL is a database that uses SQL as a query language, consists of tables that store data in the form of columns and rows, then the new format database NoSQL, appeared, it is suitable for handling large amounts of data in a variety of formats. Neo4j is one of NoSQL that is widely used, it is a graph database which provides an easy way to visualize data by storing data in the form of nodes that are connected by edges. In this paper, we compared the performance of MySQL and Neo4j databases in terms of memory usage and execution time, also we presented the flexibility of the databases using P. The results show that MySQL has a faster execution time than Neo4j, although, both these databases have the same time complexity. It is also known that Neo4j has a higher memory usage than MySQL. But Neo4j has better flexibility than MySQL.
Ioannis Ballas Vassilios Tsakanikas Evaggelos Pefanis + 1 lainnya
20 November 2020
Big Data paradigm has placed pressure on well-established relational databases during the last decade. Both academia and industry have proposed several alternative database schemes in order to model the captured data more efficiently. Among these approaches, graph databases seem the most promising candidate to supplement relational schemes. Within this study, a comparison is performed among Neo4j, one of the leading graph databases, and Apache Spark, a unified engine for distributed large-scale data processing environment, in terms of processing limits. The results reveal that Neo4j is limited to the physical RAM memory of the processing environment. Yet, until this limit is reached, the processing engine of Neo4j outperforms the Apache Spark engine.
Robert Pavliš
20 Mei 2024
As the global volume of data continues to rise at an unprecedented rate, the challenges of storing and analyzing data are becoming more and more highlighted. This is especially apparent when the data are heavily interconnected. The traditional methods of storing and analyzing data such as relational databases often encounter difficulties when dealing with large amounts of data and this is even more pronounced when the data exhibits intricate interconnections. This paper examines graph databases as an alternative to relational databases in an interconnected Big Data environment. It will also show the theoretical basis behind graph databases and how they outperform relational databases in such an environment, but also why they are better suited for this kind of environment than other NoSQL alternatives. A state of the art in graph databases and how they compare to relational databases in various scenarios will also be presented in this paper.
Marin Fotache Nicoleta Teacă Ciprian Pinzaru + 3 lainnya
19 September 2024
Among the NoSQL technologies, Neo4j is one of the most popular solutions for managing graph databases and an early adopter of transactions (contrary to other NoSQL Systems). Neo4j also provides a powerful high-level data processing language - Cypher. Despite its popularity, there are few comprehensive studies about benchmarking the query performance of Neo4j relative to SQL or other NoSQL counterparts. In this paper, we present a module for converting the TPC-H benchmark database from PostgreSQL to Neo4j, and we built a set of 110 SQL queries that were translated into Cypher. For both database servers, the queries were executed with a 10-minute timeout on OpenStack setups following nine scenarios by combining three database scale factors $(1 \mathrm{~GB}, 5 \mathrm{~GB}$, and 10 GB) with three data distribution variants (with 1,5, and 10 nodes). Results provide support for query performance assessment of these two big data products.
Daftar Referensi
0 referensiTidak ada referensi ditemukan.
Artikel yang Mensitasi
0 sitasiTidak ada artikel yang mensitasi.