Grijalba, Jorge Oses, Lopez, Luis Alfonso Urena, Camacho-Collados, Jose ORCID: https://orcid.org/0000-0003-1618-7239 and Camara, Eugenio Martinez
2024.
Towards quality benchmarking in question answering over tabular data in Spanish.
Procesamiento del Lenguaje Natural
73
, pp. 283-296.
10.26342/2024-73-21
Item availability restricted. |
PDF
- Published Version
Restricted to Repository staff only until 25 December 2024 due to copyright restrictions. Download (920kB) |
Abstract
The rapid and incessant progress of language understanding and language generation capacity of large language models (LLMs) is followed by the discovery of new capabilities. The research community has to provide evaluation benchmarks to asses these emerging capabilities by studying, analysing and comparing different LLMs under fair and realistic settings. Question answering on tabular data is an important task to assess that lacks reliable evaluation benchmarks to assess LLMs in distinct scenarios, particularly for Spanish. Hence, in this paper we present Spa-DataBench, an evaluation benchmark composed of ten datasets about different topics of the Spanish society. Likewise, each dataset is linked to a set of questions written in Spanish and their corresponding answers. These questions are used to assess LLMs and analyse their capacity for answering questions that involve one single or multiple columns of different data types, and for generating source code to resolve the questions. We evaluate six LLMs on Spa-DataBench, and we compare their performance using both Spanish and English prompts. The results on Spa-DataBench show that LLMs are able to reason on tabular data, but their performance in Spanish is worse, which means that there is still room for improvement of LLMs in the Spanish language.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Computer Science & Informatics |
Publisher: | Sociedad Española para el Procesamiento del Lenguaje Natural |
ISSN: | 1135-5948 |
Date of First Compliant Deposit: | 6 November 2024 |
Date of Acceptance: | 1 May 2024 |
Last Modified: | 10 Dec 2024 11:45 |
URI: | https://orca.cardiff.ac.uk/id/eprint/173677 |
Actions (repository staff only)
Edit Item |