FAM: Featuring Android Malware for Deep Learning-Based Familial Analysis
Abstrak
To handle relentlessly emerging Android malware, deep learning has been widely adopted in the research community. Prior work proposed deep learning-based approaches that use different features of malware, and reported a high accuracy in malware detection, i.e., classifying malware from benign applications. However, familial analysis of real-world Android malware has not been extensively studied yet. Familial analysis refers to the process of classifying a given malware into a family (or a set of families), which can greatly accelerate malware analysis as the analysis gives their fine-grained behavioral characteristics. In this work, we shed light on deep learning-based familial analysis by studying different features of Android malware and how effectively they can represent their (malicious) behaviors. We focus on string features of Android malware, namely the Abstract Syntax Trees (AST) of all functions extracted from each malware, which faithfully represent all string features of Android malware. We thoroughly study how different string features, such as how security-sensitive APIs are used in malware, affect the performance of our deep learning-based familial analysis model. A convolutional neural network was trained and tested in various configurations on 28,179 real-world malware dataset appeared in the wild from 2018 to 2020, where each malware has one or more labels assigned based on their behaviors. Our evaluation reveals how different features contribute to the performance of familial analysis. Notably, with all features combined, we were able to produce up to an accuracy of 98% and a micro F1-score of 0.82, a result on par with the state-of-the-art.
Artikel Ilmiah Terkait
Cong-Danh Nguyen Nghi Hoang Khoa Cam Nguyen Tan + 1 lainnya
11 Januari 2023
In recent years, Android malware has been overgrown, challenging malware analysts. However, there has been a lot of research in detecting and classifying Android malware based on machine learning. Android malware classification is an essential goal in classifying malware families. This paper proposes the application of machine learning and deep learning methods in classifying malware families and categories based on many different datasets to evaluate and select suitable methods for each dataset. This work demonstrates that with the Drebin and CICMaldroid2020 datasets classified by family and category, respectively, after feature extraction and selection, trained and evaluated with machine learning models, results are high accuracy, and the false positive rate is low. We also compare our results with several previous studies to highlight our results.
F. Martinelli F. Mercaldo Giacomo Iadarola + 3 lainnya
18 Juli 2021
Cybercriminals are continually working to develop increasingly aggressive malicious code to steal sensitive and private information from mobile devices. Antimalware are not always able to detect all threats, especially when they do not have previous knowledge of the malware signature. Moreover, malware code analysis remains a time-consuming process for security analysts. In this regard, we propose a method aimed to detect the malware belonging family and automatically pointing out a subset of potentially malicious classes. The rationale behind this work aims (i) to save valuable time for the security analyst by decreasing the amount of code to analyse, and (ii) to improve the interpretability of image-based deep learning model for malware family detection. We represent an application as an image and classify it with a deep learning model aimed to predict the belonging family; then, exploiting the use of activation maps, the approach points out potentially malicious classes to help the security analysts in the malicious behaviour recognition. The proposed method obtains an overall accuracy of 0.944 in the evaluation of a dataset composed of 8430 real-world Android malware, showing also that the use of activation maps can provide explainability about the deep learning model decision.
Hui Zhao Jianfei Tang
25 Juni 2022
The focus of a large amount of research on malware detection is currently working on proposing and improving neural network structures, but with the constant updates of Android, the proposed detection methods are more like a race against time. Through the analysis of these methods, we found that the basic processes of these detection methods are roughly the same, and these methods rely on professional reverse engineering tools for malware analysis and feature extraction. These tools generally have problems such as high time-space cost consumption, difficulty in achieving concurrent analysis of a large number of Apk, and the output results are not convenient for feature extraction. Is it possible to propose a general malware detection process implementation platform that optimizes each process of existing malware detection methods while being able to efficiently extract various features on malware datasets with a large number of APK? To solve this problem, we propose an automated platform, AmandaSystem, that highly integrates the various processes of deep learning-based malware detection methods. At the same time, the problem of over privilege due to the openness of Android system and thus the problem of excessive privileges has always required the accurate construction of mapping relationships between privileges and API calls, while the current methods based on function call graphs suffer from inefficiency and low accuracy. To solve this problem, we propose a new bottom-up static analysis method based on AmandaSystem to achieve an efficient and complete tool for mapping relationships between Android permissions and API calls, PerApTool. Finally, we conducted tests on three publicly available malware datasets, CICMalAnal2017, CIC-AAGM2017, and CIC-InvesAndMal2019, to evaluate the performance of AmandaSystem in terms of time efficiency of APK parsing, space occupancy, and comprehensiveness of extracted features, respectively, compared with existing methods were compared.
Muhammed Basheer Jasser Bayan Issa Mülhem İbrahim
2022
The computers nowadays are being replaced by the smartphones for the most of the internet users around the world, and Android is getting the most of the smartphone systems’ market. This rise of the usage of smartphones generally, and the Android system specifically, leads to a strong need to effectively secure Android, as the malware developers are targeting it with sophisticated and obfuscated malware applications. Consequently, a lot of studies were performed to propose a robust method to detect and classify android malicious software (malware). Some of them were effective, some were not; with accuracy below 90%, and some of them are being outdated; using datasets that became old containing applications for old versions of Android that are rarely used today. In this paper, a new method is proposed by using static analysis and gathering as most useful features of android applications as possible, along with two new proposed features, and then passing them to a functional API deep learning model we made. This method was implemented on a new and classified android application dataset, using 14079 malware and benign samples in total, with malware samples classified into four malware classes. Two major experiments with this dataset were implemented, one for malware detection with the dataset samples categorized into two classes as just malware and benign, the second one was made for malware detection and classification, using all the five classes of the dataset. As a result, our model overcomes the related works when using just two classes with F1-score of 99.5%. Also, high malware detection and classification performance was obtained by using the five classes, with F1-score of 97%.
W. El-shafai Mohanned Ahmed Iman M. Almomani
5 Juli 2022
This paper offers a comprehensive analysis model for android malware. The model presents the essential factors affecting the analysis results of android malware that are vision-based. Current android malware analysis and solutions might consider one or some of these factors while building their malware predictive systems. However, this paper comprehensively highlights these factors and their impacts through a deep empirical study. The study comprises 22 CNN (Convolutional Neural Network) algorithms, 21 of them are well-known, and one proposed algorithm. Additionally, several types of files are considered before converting them to images, and two benchmark android malware datasets are utilized. Finally, comprehensive evaluation metrics are measured to assess the produced predictive models from the security and complexity perspectives. Consequently, guiding researchers and developers to plan and build efficient malware analysis systems that meet their requirements and resources. The results reveal that some factors might significantly impact the performance of the malware analysis solution. For example, from a security perspective, the accuracy, F1-score, precision, and recall are improved by 131.29%, 236.44%, 192%, and 131.29%, respectively, when changing one factor and fixing all other factors under study. Similar results are observed in the case of complexity assessment, including testing time, CPU usage, storage size, and pre-processing speed, proving the importance of the proposed android malware analysis model.
Daftar Referensi
0 referensiTidak ada referensi ditemukan.
Artikel yang Mensitasi
0 sitasiTidak ada artikel yang mensitasi.