Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Gap filling crowdsourced air temperature data in cities using data-driven approaches

He, Miao, Luo, Zhiwen ORCID: https://orcid.org/0000-0002-2082-3958, Xie, Xiaoxiong, Wang, Peng, Wang, Haichao and Zapata-Lancaster, Gabriela ORCID: https://orcid.org/0000-0003-3239-131X 2025. Gap filling crowdsourced air temperature data in cities using data-driven approaches. Building and Environment 271 , 112593. 10.1016/j.buildenv.2025.112593

[thumbnail of 1-s2.0-S0360132325000757-main.pdf] PDF - Published Version
Available under License Creative Commons Attribution.

Download (6MB)

Abstract

Crowdsourced temperature data from citizen weather stations (CWS) in urban area offer valuable insights into intra-urban temperature distribution but are often challenged by a significant number of missing values. Existing gap-filling methods, typically effective for random gaps and low missing rates, are inadequate for the continuous gaps and high missing rates common in CWS recordings. This study introduces a novel data-driven approach to fill these gaps by modelling relationships between CWS data and official weather station (OWS) records during periods of data availability. We evaluate various feature sets and data-driven algorithms, including Multiple Linear Regression (MLR), Random Forest (RF), and Multilayer Perceptron (MLP), using a complete CWS temperature dataset from July 2018 in London. The MLP-based models, which include features such as preceding and subsequent air temperature along with past solar radiation, demonstrate superior performance across various missing data conditions. In the most challenging case, with a missing rate of 70–80% and continuous gaps, the MLP model achieves a Mean Absolute Error of 0.59 °C, a Root Mean Squared Error of 0.73 °C, and a coefficient of determination (R2) of 0.94. The robustness of the MLP algorithm is further validated across multiple complete CWS datasets from different areas in London. This study offers effective strategies for handling common missing data conditions in CWS datasets and serves as a valuable reference for future machine learning applications in urban climatology.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Architecture
Publisher: Elsevier
ISSN: 0360-1323
Funders: Royal Society
Date of First Compliant Deposit: 6 February 2025
Date of Acceptance: 20 January 2025
Last Modified: 06 Feb 2025 11:30
URI: https://orca.cardiff.ac.uk/id/eprint/175433

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics