Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Revealing semantic mappings across HAR datasets

Alevizaki, Ada, Pham, Nhat and Trigoni, Niki 2024. Revealing semantic mappings across HAR datasets. Presented at: International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT), Abu Dhabi, United Arab Emirates, 29 April - 01 May 2024. Proceedings of 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things. IEEE, pp. 93-100. 10.1109/DCOSS-IoT61029.2024.00023

Full text not available from this repository.

Abstract

Data collection is a strenuous and time-consuming process, particularly when aiming to build large datasets that are essential for training deep models effectively. While curating such datasets in controlled environments requires meticulous design and annotation, adding complexity to the process, the prevalence of mobile devices has led to the collection of extensive crowdsourced datasets reflecting diverse user behaviours. However, real-world datasets often suffer from incomplete, noisy, or incorrect annotations that introduce significant variability in user behaviour within each class, hindering model learning and generalisation. Despite various methods in these research areas attempting to mitigate this problem, there is a clear lack of explainable solutions to discerning real-world model deployment while providing underlying information about incoming data. To address this challenge, we propose SeMEDA, a Semantic Mismatch Estimation and Dataset Alignment approach which automates the alignment of datasets by establishing a cross-domain mapping that represents incoming data from a target domain through the scope of a model trained on a controlled dataset. SeMEDA identifies and addresses four key levels of semantic mismatch, enhancing the curation of cleaner datasets with trustworthy labels without the need for laborious data analysis and expert annotation. We showcase our proposed approach through two datasets: the watchHAR dataset, which was collected in controlled laboratory conditions, and the ExtraSensory dataset, which was collected in-the-wild, boosting performance accuracy in target space from 45% to over 90%.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Publisher: IEEE
ISBN: 979-8-3503-6945-8
ISSN: 2325-2936
Date of First Compliant Deposit: 19 December 2024
Date of Acceptance: 29 April 2024
Last Modified: 14 Jan 2025 13:45
URI: https://orca.cardiff.ac.uk/id/eprint/174850

Actions (repository staff only)

Edit Item Edit Item