VLM-Guided Adaptive Negative Prompting for Creative Generation

Creative generation is the synthesis of new, surprising, and valuable samples that reflect user intent yet cannot be envisioned in advance. This task aims to extend human imagination, enabling the discovery of visual concepts that exist in the unexplored spaces between familiar domains.

While text-to-image diffusion models excel at rendering photorealistic scenes that faithfully match user prompts, they still struggle to generate genuinely novel content. Existing approaches to enhance generative creativity either rely on interpolation of image features, which restricts exploration to predefined categories, or require time-intensive procedures such as embedding optimization or model fine-tuning.

We propose VLM-Guided Adaptive Negative-Prompting, a training-free, inference-time method that promotes creative image generation while preserving the validity of the generated object. Our approach utilizes a vision-language model (VLM) that analyzes intermediate outputs of the generation process and adaptively steers it away from conventional visual concepts, encouraging the emergence of novel and surprising outputs.

We evaluate creativity through both novelty and validity, using statistical metrics in the CLIP embedding space. Through extensive experiments, we show consistent gains in creative novelty with negligible computational overhead. Moreover, unlike existing methods that primarily generate single objects, our approach extends to complex scenarios, such as generating coherent sets of creative objects and preserving creativity within elaborate compositional prompts. Our method integrates seamlessly into existing diffusion pipelines, offering a practical route to producing creative outputs that venture beyond the constraints of textual descriptions.

"A photo of a new type of pet"

"A photo of a creative building"

"A photo of a new type of fruit"

"A photo of a creative bag"

"A photo of a new type of musical instrument"

"A photo of a new type of garment"

🎨 Creative Generation Process: To generate a creative image (e.g., "new type of pet"), we sample Gaussian noise and perform an augmented denoising process that maintains an adaptive list of negative prompts throughout the generation.

👁️ VLM-Guided Analysis: At each denoising step, we query a pre-trained Vision-Language Model (VLM) to analyze the intermediate output and identify visual concepts present in it. The VLM adaptively identifies concepts that the generation should avoid to maintain novelty.

🔄 Adaptive Negative Prompting: As visual concepts are detected, we update the negative prompt list accordingly, steering the denoising process away from them. For example, if the VLM identifies "cat" in the intermediate output, we add this token to the accumulating list to shift the denoising trajectory away from generating an image resembling a cat as well as any previously detected pets.

Our method generates novel objects within semantic categories and can be used for practical applications by placing these objects in diverse contexts and scenes. Recent controllable generation models like Flux.1-dev Kontext enable users to take our creatively generated objects and seamlessly integrate them into various environments while preserving their unique characteristics.

Our method extends naturally from generating individual creative objects to producing coherent sets of related items that share a unified creative vision.

Our VLM-guided approach seamlessly integrates with elaborate prompt descriptions, enabling creative exploration even within complex compositional requirements. The adaptive negative prompting mechanism operates orthogonally to these additional constraints.

BibTeX

@misc{golan2025creative,
  author        = {Golan, Shelly and Nitzan, Yotam and Wu, Zongze and Patashnik, Or},
  title         = {VLM-Guided Adaptive Negative Prompting for Creative Generation},
  year          = {2025},
  eprint        = {2510.10715},
  archivePrefix = {arXiv},
  primaryClass  = {cs.GR},
}

VLM-Guided Adaptive Negative Prompting for Creative Generation

Our method generates creative concepts such as novel pets, uniquely designed jackets, and unconventional buildings by steering the generation away from conventional patterns using a VLM-Guided Adaptive Negative Prompting process.

Abstract

Examples of Concepts Generated with our VLM-Guided Adaptive Negative-Prompting Method

"A photo of a new type of pet"

"A photo of a creative building"

"A photo of a new type of fruit"

"A photo of a creative bag"

"A photo of a new type of musical instrument"

"A photo of a new type of garment"

How Does it Work?

What Can it Do?

🌍 Diverse Scenarios

🎨 Beyond Single Objects

✨ Complex Prompts

BibTeX