Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Malware classification using self organising feature maps and machine activity data

Burnap, Pete ORCID:, French, Richard, Turner, Frederick and Jones, Kevin 2018. Malware classification using self organising feature maps and machine activity data. Computers and Security 73 , pp. 399-410. 10.1016/j.cose.2017.11.016

[thumbnail of 1-s2.0-S0167404817302535-main.pdf]
PDF - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview


In this article we use machine activity metrics to automatically distinguish between malicious and trusted portable executable software samples. The motivation stems from the growth of cyber attacks using techniques that have been employed to surreptitiously deploy Advanced Persistent Threats (APTs). APTs are becoming more sophisticated and able to obfuscate much of their identifiable features through encryption, custom code bases and in-memory execution. Our hypothesis is that we can produce a high degree of accuracy in distinguishing malicious from trusted samples using Machine Learning with features derived from the inescapable footprint left behind on a computer system during execution. This includes CPU, RAM, Swap use and network traffic at a count level of bytes and packets. These features are continuous and allow us to be more flexible with the classification of samples than discrete features such as API calls (which can also be obfuscated) that form the main feature of the extant literature. We use these continuous data and develop a novel classification method using Self Organizing Feature Maps to reduce over fitting during training through the ability to create unsupervised clusters of similar ‘behaviour’ that are subsequently used as features for classification, rather than using the raw data. We compare our method to a set of machine classification methods that have been applied in previous research and demonstrate an increase of between 7.24% and 25.68% in classification accuracy using our method and an unseen dataset over the range of other machine classification methods that have been applied in previous research.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Data Innovation Research Institute (DIURI)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Additional Information: This is an open access article under the CC BY license (
Publisher: Elsevier
ISSN: 0167-4048
Funders: Engineering and Physical Sciences Research Council
Date of First Compliant Deposit: 13 December 2017
Date of Acceptance: 24 November 2017
Last Modified: 05 May 2023 09:18

Citation Data

Cited 78 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics