Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Group-wise automatic music transcription

White, William 2018. Group-wise automatic music transcription. MPhil Thesis, Cardiff University.
Item availability restricted.

[thumbnail of 2018whitewmphil.pdf]
PDF - Accepted Post-Print Version
Download (2MB) | Preview
[thumbnail of whitew.pdf] PDF - Supplemental Material
Restricted to Repository staff only

Download (2MB)


Background: Music transcription is the conversion of musical audio into notation such that a musician can recreate the piece. Automatic music transcription (AMT) is the automation of this process. Current AMT algorithms produce a less musically meaningful transcription than human transcribers. However, AMT performs better at predicting notes present in a short time frame. Group-wise Automatic Music Transcription, (GWAMT) is when several renditions of a piece are used to give a single transcription. Aims: The main aim was to perform investigations into GWAMT. Secondary aims included: Comparing methods for GWAMT on the frame level; Considering the impact of GWAMT on the broader field of AMT. Method(s)/Procedures: GWAMT transcription is split into three stages: transcription, alignment and combination. Transcription is performed by splitting the piece into frames, and using a classifier to identify the notes present. Convolutional Neural Networks (CNNs) are used with a novel training methodology and architecture. Different renditions of the same piece have corresponding notes occurring at different times. In order to match corresponding frames, methods for the alignment of multiple renditions are used. Several methods were compared, pairwise alignment, progressive alignment and a new method, iterative alignment. The effect of when the aligned features are combined (early/late), and how (majority vote, linear opinion pool, logarithmic opinion pool, max, median), is investigated. Results: The developed method for frame-level transcription achieves state-of-the-art transcription accuracy on the MAPS database with an F1-score of 76.67%. Experiments on GWAMT show that the F1-score can be improved by between 0.005 to 0.01 using the majority vote and logarithmic pool combination methods. Conclusions/Implications: These experiments show that group-wise frame-level transcription can improve the transcription when there are different tempos, noise levels, dynamic ranges and reverbs between the clips. They also demonstrate a future application of GWAMT to individual pieces with repeated segments.

Item Type: Thesis (MPhil)
Date Type: Completion
Status: Unpublished
Schools: Computer Science & Informatics
Date of First Compliant Deposit: 26 March 2019
Last Modified: 04 Aug 2022 01:54

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics