Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Comparing the utility of user-level and kernel-level data for dynamic malware analysis

Nunes, Matthew ORCID: https://orcid.org/0000-0003-1990-5814 2019. Comparing the utility of user-level and kernel-level data for dynamic malware analysis. PhD Thesis, Cardiff University.
Item availability restricted.

[thumbnail of 2019nunesmaphd.pdf]
Preview
PDF - Accepted Post-Print Version
Available under License Creative Commons GNU LGPL (Software).

Download (3MB) | Preview
[thumbnail of Cardiff University Electronic Publication Form] PDF (Cardiff University Electronic Publication Form) - Supplemental Material
Restricted to Repository staff only

Download (2MB)

Abstract

Dynamic malware analysis is fast gaining popularity over static analysis since it is not easily defeated by evasion tactics such as obfuscation and polymorphism. During dynamic analysis, it is common practice to capture the system calls that are made to better understand the behaviour of malware. System calls are captured by hooking certain structures in the Operating System. There are several hooking techniques that broadly fall into two categories, those that run at user-level and those that run at kernel level. User-level hooks are currently more popular despite there being no evidence that they are better suited to detecting malware. The focus in much of the literature surrounding dynamic malware analysis is on the data analysis method over the data capturing method. This thesis, on the other hand, seeks to ascertain if the level at which data is captured affects the ability of a detector to identify malware. This is important because if the data captured by the hooking method most commonly used is sub-optimal, the machine learning classifier can only go so far. To study the effects of collecting system calls at different privilege levels and viewpoints, data was collected at a process-specific user-level using a virtualised sandbox environment and a systemwide kernel-level using a custom-built kernel driver for all experiments in this thesis. The experiments conducted in this thesis showed kernel-level data to be marginally better for detecting malware than user-level data. Further analysis revealed that the behaviour of malware used to differentiate it differed based on the data given to the classifiers. When trained on user-level data, classifiers used the evasive features of malware to differentiate it from benignware. These are the very features that malware uses to avoid detection. When trained on kernel-level data, the classifiers preferred to use the general behaviour of malware to differentiate it from benignware. The implications of this were witnessed when the classifiers trained on user-level and kernel-level data were made to classify malware that had been stripped of its evasive properties. Classifiers trained on user-level data could not detect malware that only possessed malicious attributes. While classifiers trained on kernel-level data were unable to detect malware that did not exhibit the amount of general activity they expected in malware. This research highlights the importance of giving careful consideration to the hooking methodology employed to collect data, since it not only affects the classification results, but a classifier’s understanding of malware.

Item Type: Thesis (PhD)
Date Type: Acceptance
Status: Unpublished
Schools: Computer Science & Informatics
Funders: EPSRC
Date of First Compliant Deposit: 19 February 2020
Date of Acceptance: 2019
Last Modified: 04 Jan 2023 02:17
URI: https://orca.cardiff.ac.uk/id/eprint/129815

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics