Rivas Rojas, Kervy and Alva-Manchego, Fernando 2021. IAPUCP at SemEval-2021 task 1: Stacking fine-tuned transformers is almost all you need for lexical complexity prediction. Presented at: 15th International Workshop on Semantic Evaluation (SemEval 2021), Virtual, 05-06 August 2021. Published in: Palmer, Alexis, Schneider, Nathan, Schluter, Natalie, Emerson, Guy, Herbelot, Aurelie and Zhu, Xaodan eds. Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021). Association for Computational Linguistics, pp. 144-149. 10.18653/v1/2021.semeval-1.14 |
PDF
- Published Version
Available under License Creative Commons Attribution. Download (275kB) |
Abstract
This paper describes our submission to SemEval-2021 Task 1: predicting the complexity score for single words. Our model leverages standard morphosyntactic and frequency-based features that proved helpful for Complex Word Identification (a related task), and combines them with predictions made by Transformer-based pre-trained models that were fine-tuned on the Shared Task data. Our submission system stacks all previous models with a LightGBM at the top. One novelty of our approach is the use of multi-task learning for fine-tuning a pre-trained model for both Lexical Complexity Prediction and Word Sense Disambiguation. Our analysis shows that all independent models achieve a good performance in the task, but that stacking them obtains a Pearson correlation of 0.7704, merely 0.018 points behind the winning submission.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Computer Science & Informatics |
Additional Information: | File distributed under a Creative Commons Attribution 4.0 International License. |
Publisher: | Association for Computational Linguistics |
Date of First Compliant Deposit: | 14 February 2022 |
Last Modified: | 14 Feb 2022 16:30 |
URI: | https://orca.cardiff.ac.uk/id/eprint/147258 |
Actions (repository staff only)
Edit Item |