Alali, Abdulazeez and Theodorakopoulos, Georgios ORCID: https://orcid.org/0000-0003-2701-7809 2024. Review of existing methods for generating and detecting fake and partially fake audio. Presented at: 10th ACM International Workshop on Security and Privacy Analytics (IWSPA 2024), Porto, Portugal, 21 June 2024. IWSPA '24: Proceedings of the 10th ACM International Workshop on Security and Privacy Analytics. New York, NY, USA: ACM, pp. 35-36. 10.1145/3643651.3659894 |
Preview |
PDF
- Accepted Post-Print Version
Download (173kB) | Preview |
Abstract
Using deep-learning technologies, both text-to-speech (TTS) and voice conversion (VC) methods can generate fake speech effectively, making it challenging to differentiate between real and fake speech. Accordingly, researchers have employed deepfake detection solutions to distinguish them. These solutions can achieve high detection accuracy and exhibit robustness against unseen data, which are data that differ from those used in initial model training. The emergence of partially fake (PF) audio, which combines real and fake speech, presents a new challenge for deepfake detection. This tutorial presents a comprehensive overview of TTS, VC, and PF generation and detection methods and analyses the characteristics of publicly available datasets for each type. Furthermore, it highlights directions for PF detection that can pave the way for valuable research in fake speech detection.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Date Type: | Published Online |
Status: | Published |
Schools: | Computer Science & Informatics |
Publisher: | ACM |
ISBN: | 979-8-4007-0556-4/24/06 |
Date of First Compliant Deposit: | 2 June 2024 |
Date of Acceptance: | 4 April 2024 |
Last Modified: | 24 Jun 2024 14:30 |
URI: | https://orca.cardiff.ac.uk/id/eprint/169368 |
Actions (repository staff only)
Edit Item |