Chen, Wenli, Sun, Yaqi, Rosin, Paul L. ![]() ![]() Item availability restricted. |
![]() |
PDF
- Accepted Post-Print Version
Restricted to Repository staff only until 5 May 2026 due to copyright restrictions. Download (10MB) |
Abstract
Generating high-quality, semantically consistent images from text descriptions remains a challenging task in computer vision. Current methods often struggle with effectively integrating textual information into the image generation process, resulting in images that lack realism or contain significant artifacts. To address these issues, we propose SDeep, a novel framework utilizing a generative adversarial network (GAN) architecture with a channel attention mechanism. SDeep deepens the text-to-image fusion process through stacked deepening blocks (SD blocks) and enhances image detail through multilayer channel attention (MLCA). Extensive experiments on the CUB and COCO datasets demonstrate that SDeep outperforms state-of-the-art methods in terms of image quality and semantic alignment with text descriptions. Our approach not only generates more realistic images but also better preserves the semantic consistency between text and generated images, marking a significant advancement in text-to-image synthesis. Code can be found at https://github.com/zxcnmmmmm/SDeep.
Item Type: | Article |
---|---|
Date Type: | Published Online |
Status: | In Press |
Schools: | Schools > Computer Science & Informatics |
Publisher: | Springer |
ISSN: | 0178-2789 |
Date of First Compliant Deposit: | 4 June 2025 |
Date of Acceptance: | 28 March 2025 |
Last Modified: | 05 Jun 2025 09:15 |
URI: | https://orca.cardiff.ac.uk/id/eprint/178780 |
Actions (repository staff only)
![]() |
Edit Item |