Sketch-guided scene-level image editing with diffusion models

Zuo, Ran, Hu, Haoxiang, Deng, Xiaoming, Li, Yaokun, Lai, Yu-Kun

, Ma, Cuixia, Liu, Yong-Jin and Wang, Hongan 2025. Sketch-guided scene-level image editing with diffusion models. Presented at: 13th International Conference, CVM 2025, Hong Kong SAR, China, 19–21 April 2025. Published in: Didyk, Piotr and Hou, Junhui eds. Computational Visual Media: CVM 2025. Lecture Notes in Computer Science , vol.15664 Singapore: Spinger, 10.1007/978-981-96-5812-1_14

[thumbnail of SketchImageEditingCVM.pdf]

Preview

PDF - Accepted Post-Print Version
Download (15MB) | Preview

Official URL: http://dx.doi.org/10.1007/978-981-96-5812-1_14

Abstract

Sketch-based image editing allows for intuitive and flexible modification of image details, effectively improving editing efficiency and diversity. When performing the scene-level image editing task where sketches are employed to control multiple objects within the editing region, existing approaches using GAN or diffusion models face limitations in handling complex editing intentions, such as editing scene content with various object attributes including spatial layout, semantics, structure, and number of objects. The challenge lies in effectively utilizing the attributes of multi-objects in the sketch and mapping these sketch attributes to the image editing region. In this work, we propose a Sketch-guided Diffusion Model called SDM, which integrates a global-to-local conditioning strategy to maximize the utilization of each object instance’s attributes in the sketch. Specifically, this strategy incorporates a multi-instance guided cross-attention module and modifies attention maps with sketch masks, to help the model capture object semantics, structure, and quantity jointly. Additionally, we optimize the generation of the shared boundary region for overlapped objects to tackle the issue of ambiguous contours and semantics around the boundary. Then we introduce the multi-instance semantic loss to compensate for the diffusion model’s limitation of potential semantics comprehension in sketches. Extensive experiments with high-quality editing results show that the proposed method outperforms state-of-the-art methods in the sketch-guided scene-level image editing task.

Item Type:	Conference or Workshop Item (Paper)
Date Type:	Publication
Status:	Published
Schools:	Schools > Computer Science & Informatics
Publisher:	Spinger
ISBN:	978-9819658114
Date of First Compliant Deposit:	11 June 2025
Date of Acceptance:	18 December 2024
Last Modified:	20 Jul 2025 01:30
URI:	https://orca.cardiff.ac.uk/id/eprint/179026

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)