Poudevigne-Durance, Thomas, Jones, Owen Dafydd ORCID: https://orcid.org/0000-0002-7300-5510 and Qin, Yipeng ORCID: https://orcid.org/0000-0002-1551-9126 2022. MaWGAN: a generative adversarial network to create synthetic data from datasets with missing data. Electronics 11 (6) , 837. 10.3390/electronics11060837 |
Preview |
PDF
- Published Version
Available under License Creative Commons Attribution. Download (1MB) | Preview |
Abstract
The creation of synthetic data are important for a range of applications, for example, to anonymise sensitive datasets or to increase the volume of data in a dataset. When the target dataset has missing data, then it is common to just discard incomplete observations, even though this necessarily means some loss of information. However, when the proportion of missing data are large, discarding incomplete observations may not leave enough data to accurately estimate their joint distribution. Thus, there is a need for data synthesis methods capable of using datasets with missing data, to improve accuracy and, in more extreme cases, to make data synthesis possible. To achieve this, we propose a novel generative adversarial network (GAN) called MaWGAN (for masked Wasserstein GAN), which creates synthetic data directly from datasets with missing values. As with existing GAN approaches, the MaWGAN synthetic data generator generates samples from the full joint distribution. We introduce a novel methodology for comparing the generator output with the original data that does not require us to discard incomplete observations, based on a modification of the Wasserstein distance and easily implemented using masks generated from the pattern of missing data in the original dataset. Numerical experiments are used to demonstrate the superior performance of MaWGAN compared to (a) discarding incomplete observations before using a GAN, and (b) imputing missing values (using the GAIN algorithm) before using a GAN
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Mathematics |
Additional Information: | This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/ |
Publisher: | MDPI |
ISSN: | 2079-9292 |
Date of First Compliant Deposit: | 4 March 2022 |
Date of Acceptance: | 4 March 2022 |
Last Modified: | 22 Mar 2024 12:27 |
URI: | https://orca.cardiff.ac.uk/id/eprint/148018 |
Actions (repository staff only)
Edit Item |