Creative generation is the synthesis of new, surprising, and valuable samples that reflect user intent yet cannot be envisioned in advance. This task aims to extend human imagination, enabling the discovery of visual concepts that exist in the unexplored spaces between familiar domains.
While text-to-image diffusion models excel at rendering photorealistic scenes that faithfully match user prompts, they still struggle to generate genuinely novel content. Existing approaches to enhance generative creativity either rely on interpolation of image features, which restricts exploration to predefined categories, or require time-intensive procedures such as embedding optimization or model fine-tuning.
We propose VLM-Guided Adaptive Negative-Prompting, a training-free, inference-time method that promotes creative image generation while preserving the validity of the generated object. Our approach utilizes a vision-language model (VLM) that analyzes intermediate outputs of the generation process and adaptively steers it away from conventional visual concepts, encouraging the emergence of novel and surprising outputs.
We evaluate creativity through both novelty and validity, using statistical metrics in the CLIP embedding space. Through extensive experiments, we show consistent gains in creative novelty with negligible computational overhead. Moreover, unlike existing methods that primarily generate single objects, our approach extends to complex scenarios, such as generating coherent sets of creative objects and preserving creativity within elaborate compositional prompts. Our method integrates seamlessly into existing diffusion pipelines, offering a practical route to producing creative outputs that venture beyond the constraints of textual descriptions.
🎨 Creative Generation Process: To generate a creative image (e.g., "new type of pet"), we sample Gaussian noise and perform an augmented denoising process that maintains an adaptive list of negative prompts throughout the generation.
👁️ VLM-Guided Analysis: At each denoising step, we query a pre-trained Vision-Language Model (VLM) to analyze the intermediate output and identify visual concepts present in it. The VLM adaptively identifies concepts that the generation should avoid to maintain novelty.
🔄 Adaptive Negative Prompting: As visual concepts are detected, we update the negative prompt list accordingly, steering the denoising process away from them. For example, if the VLM identifies "cat" in the intermediate output, we add this token to the accumulating list to shift the denoising trajectory away from generating an image resembling a cat as well as any previously detected pets.
Our method generates novel objects within semantic categories and can be used for practical applications by placing these objects in diverse contexts and scenes. Recent controllable generation models like Flux.1-dev Kontext enable users to take our creatively generated objects and seamlessly integrate them into various environments while preserving their unique characteristics.
"A photo of a new type of pet"
"A plush toy of this pet"
"A kid hugging this pet"
"This pet surfing on a board"
Our method extends naturally from generating individual creative objects to producing coherent sets of related items that share a unified creative vision.
Tea Set
Cutlery Set
Chess Set
Luggage Set
Our VLM-guided approach seamlessly integrates with elaborate prompt descriptions, enabling creative exploration even within complex compositional requirements. The adaptive negative prompting mechanism operates orthogonally to these additional constraints.
"A photo of an imaginary pet surfing on a board near an island"
"A photo of a new type of plant blooming in an arctic field with penguins"
"A photo of a new type of fruit sliced on a plate on a windowsill"
"A photo of a woman wearing a creative jacket in a french cafe"
@article{golan2025creative,
author = {Golan, Shelly and Nitzan, Yotam and Wu, Zongze and Patashnik, Or},
title = {VLM-Guided Adaptive Negative Prompting for Creative Generation},
journal = {ArXiv},
year = {2025},
}