Cohn, Anthony G, Hernández-Orallo, José, Mboli, Julius Sechang, Moros-Daval, Yael, Xiang, Zhiliang ![]() ![]() |
Preview |
PDF
- Published Version
Available under License Creative Commons Attribution. Download (1MB) | Preview |
Abstract
The current and future capabilities of Artificial Intelligence (AI) are typically assessed with an ever increasing number of benchmarks, competitions, tests and evaluation standards, which are meant to work as AI evaluation instruments (EI). These EIs are not only increasing in number, but also in complexity and diversity, making it hard to understand this evaluation landscape in a meaningful way. In this paper we present an approach for categorising EIs using a set of 18 facets, accompanied by a rubric to allow anyone to apply the framework to any existing or new EI. We apply the rubric to 23 EIs in different domains through a team of raters, and analyse how consistent the rubric is and how well it works to distinguish between EIs and map the evaluation landscape in AI.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Computer Science & Informatics |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Publisher: | CEUR Workshop Proceedings |
ISSN: | 1613-0073 |
Related URLs: | |
Date of First Compliant Deposit: | 8 August 2022 |
Date of Acceptance: | 3 June 2022 |
Last Modified: | 24 Aug 2022 11:00 |
URI: | https://orca.cardiff.ac.uk/id/eprint/151802 |
Actions (repository staff only)
![]() |
Edit Item |