Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Understanding malware behaviour in online social networks and predicting cyber attack

Javed, Amir 2019. Understanding malware behaviour in online social networks and predicting cyber attack. PhD Thesis, Cardiff University.
Item availability restricted.

[thumbnail of 2019javedaphd.pdf]
PDF - Accepted Post-Print Version
Download (3MB) | Preview
[thumbnail of Cardiff University Electronic Publication Form] PDF (Cardiff University Electronic Publication Form) - Supplemental Material
Restricted to Repository staff only

Download (121kB)


The popularity of Twitter for information discovery, coupled with the automatic shortening of URLs to save space given the 280 character limit, provides cybercriminals with an opportunity to obfuscate the URL of a malicious Webpage within a tweet. Once the URL is obfuscated, the cybercriminal can lure a user into clicking on it with enticing text and images before carrying out a cyber attack using a malicious Web server. This is known as a drive-by download and has been reported to account for 48% of web-based attacks. In a drive-by download a user’s computer system is infected while interacting with the malicious endpoint, often without them being made aware the attack has taken place. An attacker can gain control of the system by exploiting unpatched system vulnerabilities, and this form of attack currently represents one of the most common methods employed. In order to counter drive-by download attacks on Twitter, this thesis contributes to the existing literature on detecting malware on online social networks by shifting the focus towards the effects of malware on user machines, and away from the malware signature and dynamic behaviour, which can be obfuscated. Initially we developed a drive-by download detection model for Twitter that was successful in classifying URLs into malicious and benign with an F-measure of 0.81 during training, and 0.71 while testing on an unseen dataset. The model was then extended into a predictive model that was able to redictwhethertheURLwaspointingtoamaliciousWebpagewith0.99F-measure (using 10-fold cross-validation) and 0.833 F-measure (using an unseen test set) at 1 second into the interaction with a URL. These provide a novel contribution with which it is possible to kill the connection to the server before an attack has completed - thus proactively blocking and preventing an attack, rather than reacting and repairing at a later date. This thesis also contributes to the broader literature on malware propagation by uncovering both social and content-based factors that aid in the propagation of a tweet containing a link to a malicious Web server. This was achieved by gathering data from seven different and diverse sporting events over a period of three years. The data were then analysed to answer questions including: why are certain Tweets retweeted more than others? is virality partly driven by psychological arousal? and, is the act of retweeting affected by the tweet content and the emotions it evokes? Experimental results showed a strong association towards content-driven features, such as emotions and the choice of words associated with emotions that were used to compose a tweet or create hashtags. Tweets that contain malicious links were associated with negative emotions, particularly the emotion fear, for their retweet likelihood (virality) and survival (longevity of propagation). Whereas, in tweets that were classified as benign, it was positive sentiment and high arousal emotions such as surprise that were associated with the size and survival of Web links.

Item Type: Thesis (PhD)
Date Type: Completion
Status: Unpublished
Schools: Computer Science & Informatics
Date of First Compliant Deposit: 13 May 2020
Last Modified: 18 May 2020 10:21

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics