ORCA
Online Research @ Cardiff

Clear Cookie - decide language by browser settings

Real-time classification of malicious URLs on Twitter using Machine Activity Data

Burnap, Peter

, Javed, Amir

, Rana, Omer Farooq

and Awan, Malik 2015. Real-time classification of malicious URLs on Twitter using Machine Activity Data. Presented at: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Paris, France, 25-27 August 2015. 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). ACM, pp. 970-977. 10.1145/2808797.2809281

Preview

PDF - Accepted Post-Print Version
Download (2MB) | Preview

Official URL: https://doi.org/10.1145/2808797.2809281

Abstract

Massive online social networks with hundreds of millions of active users are increasingly being used by Cyber criminals to spread malicious software (malware) to exploit vulnerabilities on the machines of users for personal gain. Twitter is particularly susceptible to such activity as, with its 140 character limit, it is common for people to include URLs in their tweets to link to more detailed information, evidence, news reports and so on. URLs are often shortened so the endpoint is not obvious before a person clicks the link. Cyber criminals can exploit this to propagate malicious URLs on Twitter, for which the endpoint is a malicious server that performs unwanted actions on the person’s machine. This is known as a drive-by-download. In this paper we develop a machine classification system to distinguish between malicious and benign URLs within seconds of the URL being clicked (i.e. ‘real-time’). We train the classifier using machine activity logs created while interacting with URLs extracted from Twitter data collected during a large global event – the Superbowl – and test it using data from another large sporting event – the Cricket World Cup. The results show that machine activity logs produce precision performances of up to 0.975 on training data from the first event and 0.747 on a test data from a second event. Furthermore, we examine the properties of the learned model to explain the relationship between machine activity and malicious software behaviour, and build a learning curve for the classifier to illustrate that very small samples of training data can be used with only a small detriment to performance.

Item Type:	Conference or Workshop Item - published (Paper)
Date Type:	Publication
Status:	Published
Schools:	Schools > Computer Science & Informatics Research Institutes & Centres > Data Innovation Research Institute (DIURI)
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software
Publisher:	ACM
Funders:	Engineering and Physical Sciences Research Council
Date of First Compliant Deposit:	30 March 2016
Last Modified:	10 Sep 2025 22:15
URI:	https://orca.cardiff.ac.uk/id/eprint/76190

Citation Data

Cited 17 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)