Spatiotemporal Data Mining: A Survey

Zhe Jiang Arun Sharma S. Shekhar

Abstrak

Spatiotemporal data mining aims to discover interesting, useful but non-trivial patterns in big spatial and spatiotemporal data. They are used in various application domains such as public safety, ecology, epidemiology, earth science, etc. This problem is challenging because of the high societal cost of spurious patterns and exorbitant computational cost. Recent surveys of spatiotemporal data mining need update due to rapid growth. In addition, they did not adequately survey parallel techniques for spatiotemporal data mining. This paper provides a more up-to-date survey of spatiotemporal data mining methods. Furthermore, it has a detailed survey of parallel formulations of spatiotemporal data mining.

Artikel Ilmiah Terkait

Spatiotemporal data mining: a survey on challenges and open problems

Flora D. Salim Amr Mohamed A. Erradi + 3 lainnya

31 Maret 2021

Spatiotemporal data mining (STDM) discovers useful patterns from the dynamic interplay between space and time. Several available surveys capture STDM advances and report a wealth of important progress in this field. However, STDM challenges and problems are not thoroughly discussed and presented in articles of their own. We attempt to fill this gap by providing a comprehensive literature survey on state-of-the-art advances in STDM. We describe the challenging issues and their causes and open gaps of multiple STDM directions and aspects. Specifically, we investigate the challenging issues in regards to spatiotemporal relationships, interdisciplinarity, discretisation, and data characteristics. Moreover, we discuss the limitations in the literature and open research problems related to spatiotemporal data representations, modelling and visualisation, and comprehensiveness of approaches. We explain issues related to STDM tasks of classification, clustering, hotspot detection, association and pattern mining, outlier detection, visualisation, visual analytics, and computer vision tasks. We also highlight STDM issues related to multiple applications including crime and public safety, traffic and transportation, earth and environment monitoring, epidemiology, social media, and Internet of Things.

Distributed Mining of Spatial High Utility Itemsets in Very Large Spatiotemporal Databases using Spark In-Memory Computing Architecture

Yutaka Watanobe I. Paik Minh-Son Dao + 5 lainnya

10 Desember 2020

Finding Spatial High Utility Itemsets (SHUIs) in a spatiotemporal database is a challenging problem of great importance in many real-world applications. Most previous works focused on the sequential discovery of SHUIs in a database running on a single machine. Consequently, these works are not suitable for big data (or cloud-based) applications as they suffer from the scalability and fault tolerant problems. This paper proposes several novel pruning techniques to reduce the search space and present a more flexible distributed algorithm to find all desired itemsets from the database using Spark in-memory computing architecture. Our algorithm inherits several advantages of Spark, including low communication cost, fault tolerance, and high scalability. Experimental results demonstrate that the proposed algorithm has good scalability and performance on very large databases. Finally, we present a real-world navigation application in which SHUIs generated from the traffic congestion data have been employed to recommend alternative routes to the users.

Spatial Data Mining Methods Databases and Statistics Point of Views

Suman Rajest S Mr Bhopendra Singh Mr Regin R Mr

28 Februari 2021

This article reviews the approaches used in data mining to perform a geographical study of regional datasets coupled with Geographic Information Systems (GIS). Firstly, we can look at the functions of data mining used by such data and then illustrate their precision compared to their classic data use. We will further explain the research conducted in this sector and point out that two separate methods exist: one is focused on space database learning while the other is based on space statistics. Finally, we will address the key distinctions between these two methods and their similar features.

Transforming Multidimensional Time Series into Interpretable Event Sequences for Advanced Data Mining

Xu Yan Jianjun Wei Yaoting Jiang + 2 lainnya

22 September 2024

This paper introduces a novel spatiotemporal feature representation model designed to address the limitations of traditional methods in multidimensional time series (MTS) analysis. The proposed approach converts MTS into one-dimensional sequences of spatially evolving events, preserving the complex coupling relationships between dimensions. By employing a variable-length tuple mining method, key spatiotemporal features are extracted, enhancing the interpretability and accuracy of time series analysis. Unlike conventional models, this unsupervised method does not rely on large training datasets, making it adaptable across different domains. Experimental results from motion sequence classification validate the model's superior performance in capturing intricate patterns within the data. The proposed framework has significant potential for applications across various fields, including backend services for monitoring and optimizing IT infrastructure, medical diagnosis through continuous patient monitoring and health trend analysis, and internet businesses for tracking user behavior and forecasting sales. This work offers a new theoretical foundation and technical support for advancing time series data mining and its practical applications in human behavior recognition and other domains.

A Distributed Method for Fast Mining Frequent Patterns From Big Data

Pengchi Huang K. W. Lin Young-Lin Chen + 3 lainnya

2021

In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.