Graph data warehousing
Abstrak
Over the last decade, we have witnessed the emergence of networks in a wide spectrum of application domains, ranging from social and information networks to biological and transportation networks. Graphs provide a solid theoretical foundation for modeling complex networks, and revealing valuable insights from both the network structure and the data embedded within its entities. As the business and social environments are getting increasingly complex and interconnected, graphs became a widespread abstraction at the core of the information infrastructure supporting those environments. Modern information systems consist of a large number of sophisticated and interacting business entities that naturally form graphs. In particular, integrating graphs into data warehouse systems received a lot of interest from both academia and industry. Indeed, data warehouses are the central enterprise's information repository, and are critical for proper decision support and future planning. Graph warehousing is emerging as the field that extends current information systems with graph management and analytics capabilities. Many approaches were proposed to address the graph data warehousing challenge. These efforts laid the foundation for multidimensional modeling and analysis of graphs. However, most of the proposed approaches partially tackle the graph warehousing problem by being restricted to simple abstractions such as homogeneous graphs or ignoring important topics such as multidimensional integrity constraints and dimension hierarchies. In this dissertation, we conduct a systematic study of the graph data warehousing topic, and address the key challenges of database and multidimensional modeling of graphs. We first propose GRAD, a new graph database model specifically tuned for warehousing and OLAP analytics. GRAD aims to provide analysts with a set of simple, well-defined, and adaptable conceptual components to support rich semantics and perform complex analysis on graphs. Then, we define the multidimensional concepts for heterogeneous attributed graphs and highlight the new types of measures that could be derived. We project this multidimensional model on property graphs and explore how to extract the candidate multidimensional concepts and build graph cubes. Then, we extend the multidimensional model by integrating GRAD and show how graph modeling based on GRAD facilitates multidimensional modeling, and enables supporting dimension hierarchies and building new types of OLAP cubes on graphs. Afterwards, we present TopoGraph, a graph data warehousing framework that extends current graph warehousing models with new types of cubes and queries combining graph-oriented and OLAP querying. TopoGraph goes beyond traditional OLAP cubes, which process value-based grouping of tables, by considering in addition the topological properties of the graph elements. And it goes beyond current graph warehousing models by proposing new types of graph cubes. These cubes embed a rich repertoire of measures that could be represented with numerical values, with entire graphs, or as a combination of them. Finally, we propose an architecture of the graph data warehouse and describe its main building blocks and the remaining gaps. The various components of the graph warehousing framework can be effectively leveraged as a foundation for designing and building industry-grade graph data warehouses. We believe that our research in this thesis brings us a step closer towards a better understanding of graph warehousing. Yet, the models and framework we proposed are the tip of the iceberg. The marriage of graph and warehousing technologies will bring many exciting research opportunities, which we briefly discuss at the end of the thesis. Durant l’última dècada, hem estat testimonis de l’aparició de xarxes en un ampli espectre de dominis d’aplicació, que van de les xarxes socials i d’informació a xarxes biològiques i de transport. Els grafs proporcionen un fonament teòric sòlid per a modelar xarxes complexes i revelen informació valuosa tant de l'estructura de la xarxa com de les dades integrades a les seves entitats. A mesura que els entorns empresarials i socials són cada cop més complexos i interconnectats, els grafs es van convertir en una abstracció generalitzada en el nucli de la infraestructura d'informació que dona suport a aquests entorns. Els sistemes d'informació moderns consisteixen en un gran nombre d'entitats empresarials i la seva interacció, que formen grafs de forma natural. En particular, la integració de grafs en sistemes de magatzem de dades va rebre molt d’interès tant de l’àmbit acadèmic com de la indústria. De fet, els magatzems de dades són el repositori central d'informació de l'empresa i són fonamentals per a un suport adequat a la presa de decisions i una planificació futura. Els magatzems de dades en graf (graph data warehousing) és un camp emergent que estén els sistemes d’informació tradicionals amb capacitats d’administració i d’anàlisi de dades en format grafs. Fins ara, s'han proposat molts enfocaments per afrontar el repte de l'emmagatzematge de dades en graf. Aquests esforços van posar els fonaments pel modelatge i l'anàlisi de grafs d'una perspectiva multidimensional. Tanmateix, la majoria dels plantejaments proposats aborden parcialment el problema de l'emmagatzematge de grafs restringint-se a abstraccions simples com ara grafs homogenis o ignorant temes importants com ara restriccions d’integritat multidimensionals i jerarquies de dimensió. En aquesta tesi realitzem un estudi sistemàtic del tema d'emmagatzematge de dades en graf i tractem els reptes clau de la base de dades i el modelatge multidimensional de grafs. Primer proposem GRAD, un nou model de base de dades de grafs específicament ajustat per a emmagatzematge i analítica OLAP. GRAD pretén proporcionar als analistes un conjunt de components conceptuals simples, ben definits i adaptables per donar suport a elements semàntics complexos i realitzar anàlisis complexos sobre grafs. A continuació, definim els conceptes multidimensionals per a grafs heterogenis amb atributs i ressaltem els nous tipus de mesures que es poden derivar. Projectem aquest model multidimensional en property graphs i explorem com extreure conceptes multidimensionals candidats i construir cubs de grafs. A continuació, ampliem el model multidimensional integrant GRAD i mostrem com el modelatge de grafs basat en GRAD facilita el modelatge multidimensional i permet suportar jerarquies de dimensions i crear nous tipus de cubs OLAP en grafs. Després, presentem TopoGraph, un marc d’emmagatzematge de dades en graf que amplia els models d’emmagatzematge de grafs actuals amb nous tipus de cubs i consultes que combinen la consulta orientada a grafs i OLAP. TopoGraph va més enllà dels cubs tradicionals OLAP, que processen l'agrupació de taules basada en el valor, considerant a més les propietats topològiques dels grafs. I va més enllà dels models d’emmagatzematge en graf actuals proposant nous tipus de cubs de grafs. Aquests cubs incorporen un ric repertori de mesures que es podrien representar amb valors numèrics, amb grafs sencers o com a combinació d’aquests. Finalment, proposem una arquitectura per al magatzem de dades en graf i descrivim els blocs de construcció principals i els buits restants. Els diversos components del marc d'emmagatzematge de grafs es poden aprofitar eficaçment com a base per dissenyar i construir magatzems de dades de grafs a nivell industrial. Creiem que la nostra recerca en aquesta tesi ens apropa un pas més cap a una millor comprensió de graph warehousing.
Artikel Ilmiah Terkait
Yuanyuan Tian
23 November 2022
Rapidly growing social networks and other graph data have created a high demand for graph technologies in the market. A plethora of graph databases, systems, and solutions have emerged, as a result. On the other hand, graph has long been a well studied area in the database research community. Despite the numerous surveys on various graph research topics, there is a lack of survey on graph technologies from an industry perspective. The purpose of this paper is to provide the research community with an industrial perspective on the graph database landscape, so that graph researcher can better understand the industry trend and the challenges that the industry is facing, and work on solutions to help address these problems.
Robert Pavliš
20 Mei 2024
As the global volume of data continues to rise at an unprecedented rate, the challenges of storing and analyzing data are becoming more and more highlighted. This is especially apparent when the data are heavily interconnected. The traditional methods of storing and analyzing data such as relational databases often encounter difficulties when dealing with large amounts of data and this is even more pronounced when the data exhibits intricate interconnections. This paper examines graph databases as an alternative to relational databases in an interconnected Big Data environment. It will also show the theoretical basis behind graph databases and how they outperform relational databases in such an environment, but also why they are better suited for this kind of environment than other NoSQL alternatives. A state of the art in graph databases and how they compare to relational databases in various scenarios will also be presented in this paper.
Oluwatosin Agbaakin Sydney Anuyah Victor Bolade
15 November 2024
This tutorial serves as a comprehensive guide for understanding graph databases, focusing on the fundamentals of graph theory while showcasing practical applications across various fields. It starts by introducing foundational concepts and delves into the structure of graphs through nodes and edges, covering different types such as undirected, directed, weighted, and unweighted graphs. Key graph properties, terminologies, and essential algorithms for network analysis are outlined, including Dijkstras shortest path algorithm and methods for calculating node centrality and graph connectivity. The tutorial highlights the advantages of graph databases over traditional relational databases, particularly in efficiently managing complex, interconnected data. It examines leading graph database systems such as Neo4j, Amazon Neptune, and ArangoDB, emphasizing their unique features for handling large datasets. Practical instructions on graph operations using NetworkX and Neo4j are provided, covering node and edge creation, attribute assignment, and advanced queries with Cypher. Additionally, the tutorial explores common graph visualization techniques using tools like Plotly and Neo4j Bloom, which enhance the interpretation and usability of graph data. It also delves into community detection algorithms, including the Louvain method, which facilitates clustering in large networks. Finally, the paper concludes with recommendations for researchers interested in exploring the vast potential of graph technologies.
O. Lassila Carlos Rojas Pedro A. Szekely + 4 lainnya
12 Juni 2022
In this short position paper, we argue that there is a need for a unifying data model that can support popular graph formats such as RDF, RDF* and property graphs, while at the same time being powerful enough to naturally store information from complex knowledge graphs, such as Wikidata, without the need for a complex reification scheme. Our proposal, called the multilayer graph model, presents a simple and flexible data model for graphs that can naturally support all of the above, and more. We also observe that the idea of multilayer graphs has appeared in existing graph systems from different vendors and research groups, illustrating its versatility.
Veronica Santos Bruno Cuconato
24 Desember 2024
Graphs are the most suitable structures for modeling objects and interactions in applications where component inter-connectivity is a key feature. There has been increased interest in graphs to represent domains such as social networks, web site link structures, and biology. Graph stores recently rose to prominence along the NoSQL movement. In this work we will focus on NOSQL graph databases, describing their peculiarities that sets them apart from other data storage and management solutions, and how they differ among themselves. We will also analyze in-depth two different graph database management systems - AllegroGraph and Neo4j that uses the most popular graph models used by NoSQL stores in practice: the resource description framework (RDF) and the labeled property graph (LPG), respectively.
Daftar Referensi
0 referensiTidak ada referensi ditemukan.
Artikel yang Mensitasi
0 sitasiTidak ada artikel yang mensitasi.