Pedziwiatr, Marek A., Kümmerer, Matthias, Wallis, Thomas S.A., Bethge, Matthias and Teufel, Christoph ORCID: https://orcid.org/0000-0003-3915-9716 2021. Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations. Cognition 206 , 104465. 10.1016/j.cognition.2020.104465 |
PDF
- Accepted Post-Print Version
Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (835kB) |
Abstract
Eye movements are vital for human vision, and it is therefore important to understand how observers decide where to look. Meaning maps (MMs), a technique to capture the distribution of semantic information across an image, have recently been proposed to support the hypothesis that meaning rather than image features guides human gaze. MMs have the potential to be an important tool far beyond eye-movements research. Here, we examine central assumptions underlying MMs. First, we compared the performance of MMs in predicting fixations to saliency models, showing that DeepGaze II – a deep neural network trained to predict fixations based on high-level features rather than meaning – outperforms MMs. Second, we show that whereas human observers respond to changes in meaning induced by manipulating object-context relationships, MMs and DeepGaze II do not. Together, these findings challenge central assumptions underlying the use of MMs to measure the distribution of meaning in images.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Psychology Cardiff University Brain Research Imaging Centre (CUBRIC) |
Publisher: | Elsevier |
ISSN: | 0010-0277 |
Date of First Compliant Deposit: | 22 September 2020 |
Date of Acceptance: | 31 August 2020 |
Last Modified: | 08 Nov 2023 04:30 |
URI: | https://orca.cardiff.ac.uk/id/eprint/135011 |
Citation Data
Cited 15 times in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
Edit Item |