Poudevigne-Durance, Thomas
2024.
Generative adversarial networks for the synthesis of
unbalanced irregular time series.
PhD Thesis,
Cardiff University.
Item availability restricted. |
Preview |
PDF
- Accepted Post-Print Version
Download (3MB) | Preview |
PDF (Cardiff University Electronic Publication Form)
- Supplemental Material
Restricted to Repository staff only Download (341kB) |
Abstract
Good quality data are key to informed decision making. Yet real-world data collection can be challenging and resource demanding, so that good quality data are not always available. This thesis explores synthetic data generation, focusing on Generative Adversarial Networks (GANs) which are a class of machine learning models. Real-life environmental data such as water quality data brings added challenges in that they are often datasets with missing values, time dependencies and rare events. The aim of the thesis was to investigate the potential of GANs to synthesise such data. Identification of the data synthesis techniques available for such datasets showed that while GANs are complex models with multiple parameters to optimise, they could be well suited for this purpose. To address the challenge of synthetising unbalanced irregular time series, first a novel GAN was built to create synthetic data directly from datasets with missing values. Called MaWGAN, it is based on the Wasserstein distance (like WGAN) and uses a mask to hide missing values from the Critic to enable the GAN to run. Then to capture the dependency in the data, two routes were developed: a new algorithm (Force-GAN) that embedded a third neural network in the MaWGAN architecture, and an alternative option where missing data in the irregular time series are first imputed with a novel technique (Hankel Imputation) that preserves the noise in the data. These insights are then used to illustrate the value of the novel GANs developed in this thesis to address complex real-world challenges such as taste and odour issues in drinking water. Research contributions to the field include: a GAN that can handle missing values in the dataset (MaWGAN), an expansion that handles time series as well (Force-GAN), a method to impute data while keeping the original noise and trend of the dataset (Hankel Imputation). The value of these novel methods to predict rare events in time series is also demonstrated using water quality unbalanced datasets with missing values.
Item Type: | Thesis (PhD) |
---|---|
Date Type: | Completion |
Status: | Unpublished |
Schools: | Mathematics |
Subjects: | Q Science > QA Mathematics |
Funders: | KESS |
Date of First Compliant Deposit: | 5 August 2024 |
Last Modified: | 05 Aug 2024 08:53 |
URI: | https://orca.cardiff.ac.uk/id/eprint/171139 |
Actions (repository staff only)
Edit Item |