Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Machine learning and Natural Language Processing of social media data for event detection in smart cities

Hodorog, Andrei ORCID:, Petri, Ioan ORCID: and Rezgui, Yacine ORCID: 2022. Machine learning and Natural Language Processing of social media data for event detection in smart cities. Sustainable Cities and Society 85 , 104026. 10.1016/j.scs.2022.104026

[thumbnail of 1-s2.0-S2210670722003468-main.pdf] PDF - Published Version
Available under License Creative Commons Attribution.

Download (6MB)


Social media data analysis in a smart city context can represent an efficacious instrument to inform decision making. The manuscript strives to leverage the power of Natural Language Processing (NLP) techniques applied to Twitter messages using supervised learning to achieve real-time automated event detection in smart cities. A semantic-based taxonomy of risks is devised to discover and analyse associated events from data streams, with a view to: (i) read and process, in real-time, published texts (ii) classify each text into one representative real-world category (iii) assign a citizen satisfaction value to each event. To select the language processing models striking the best balance between accuracy and processing speed, we conducted a pre-emptive evaluation, comparing several baseline language models formerly employed by researchers for event classification. A heuristic analysis of several smart cities and community initiatives was conducted, with a view to define real-world scenarios as basis for determining correlations between two or more co-occurring event types and their associated levels of citizen satisfaction, while further considering environmental factors. Based on Multiple Regression Analysis (MRA), we established the relationships between scenario variables, obtaining a variance of 60%–90% between the dependent and independent variables. The selected combination of supervised NLP techniques leverages an accuracy of 88.5%. We found that all regression models had at least one variable below the 0.05 threshold of the , therefore at least one statistically significant independent variable. These findings ultimately illustrate how citizens, taking the role of active social sensors, can yield vital data that authorities can use to make educated decisions and sustainably construct smarter cities.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Engineering
Additional Information: This is an open access article under the CC-BY-NC-ND 4.0 International (CC BY-NC-ND 4.0).
Publisher: Elsevier
ISSN: 2210-6707
Funders: EPSRC
Date of First Compliant Deposit: 15 July 2022
Date of Acceptance: 22 June 2022
Last Modified: 28 May 2023 01:37

Citation Data

Cited 6 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics