Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

A benchmark for neural readability assessment of texts in Spanish

Vasquez-Rodriguez, Laura, Cuenca-Jimenez, Pedro-Manuel, Morales-Esquivel, Sergio and Alva Manchego, Fernando 2022. A benchmark for neural readability assessment of texts in Spanish. Presented at: Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), Abu Dhabi, United Arab Emirates (Virtual), 8 December 2022. Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022). Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 188-198.

Full text not available from this repository.

Abstract

We release a new benchmark for Automated Readability Assessment (ARA) of texts in Spanish. We combined existing corpora with suitable texts collected from the Web, thus creating the largest available dataset for ARA of Spanish texts. All data was pre-processed and categorised to allow experimenting with ARA models that make predictions at two (simple and complex) or three (basic, intermediate, and advanced) readability levels, and at two text granularities (paragraphs and sentences). An analysis based on readability indices shows that our proposed datasets groupings are suitable for their designated readability level. We use our benchmark to train neural ARA models based on BERT in zero-shot, few-shot, and crosslingual settings. Results show that either a monolingual or multilingual pre-trained model can achieve good results when fine-tuned in language-specific data. In addition, all models decrease their performance when predicting three classes instead of two, showing opportunities for the development of better ARA models for Spanish with existing resources.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Publisher: Association for Computational Linguistics
ISBN: 978-1-959429-25-8
Last Modified: 05 Sep 2023 13:48
URI: https://orca.cardiff.ac.uk/id/eprint/161899

Actions (repository staff only)

Edit Item Edit Item