Nunes, Matthew ORCID: https://orcid.org/0000-0003-1990-5814, Burnap, Peter ORCID: https://orcid.org/0000-0003-0396-633X, Rana, Omer ORCID: https://orcid.org/0000-0003-3597-2646, Reinecke, Philipp ORCID: https://orcid.org/0000-0002-2411-0891 and Lloyd, Kaelon 2019. Getting to the root of the problem: A detailed comparison of kernel and user level data for dynamic malware analysis. Journal of Information Security and Applications 48 , 102365. 10.1016/j.jisa.2019.102365 |
Preview |
PDF
- Published Version
Available under License Creative Commons Attribution. Download (1MB) | Preview |
Abstract
Dynamic malware analysis is fast gaining popularity over static analysis since it is not easily defeated by evasion tactics such as obfuscation and polymorphism. During dynamic analysis it is common practice to capture the system calls that are made to better understand the behaviour of malware. There are several techniques to capture system calls, the most popular of which is a user-level hook. To study the effects of collecting system calls at different privilege levels and viewpoints, we collected data at a process-specific user-level using a virtualised sandbox environment and a system-wide kernel-level using a custom-built kernel driver. We then tested the performance of several state-of-the-art machine learning classifiers on the data. Random Forest was the best performing classifier with an accuracy of 95.2% for the kernel driver and 94.0% at a user-level. The combination of user and kernel level data gave the best classification results with an accuracy of 96.0% for Random Forest. This may seem intuitive but was hitherto not empirically demonstrated. Additionally, we observed that machine learning algorithms trained on data from the user-level tended to use the anti-debug/anti-vm features in malware to distinguish it from benignware. Whereas, when trained on data from our kernel driver, machine learning algorithms seemed to use the differences in the general behaviour of the system to make their prediction, which explains why they complement each other so well. Our results show that capturing data at different privilege levels will affect the classifier's ability to detect malware, with kernel-level providing more utility than user-level for malware classification. Despite this, there exist more established user-level tools than kernel-level tools, suggesting more research effort should be directed at kernel-level. In short, this paper provides the first objective, evidence-based comparison of user and kernel level data for the purposes of malware classification.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Computer Science & Informatics |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software |
Publisher: | Elsevier |
ISSN: | 2214-2126 |
Funders: | EPSRC |
Date of First Compliant Deposit: | 30 July 2019 |
Date of Acceptance: | 26 July 2019 |
Last Modified: | 05 Jan 2024 06:55 |
URI: | https://orca.cardiff.ac.uk/id/eprint/124535 |
Citation Data
Cited 8 times in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
Edit Item |