Researchers Discover Simple Sentence to Enhance Creativity in AI Models

In a groundbreaking development in the field of artificial intelligence, researchers from Northeastern University, Stanford University, and West Virginia University have unveiled a remarkably simple yet effective method to enhance the creativity of large language models (LLMs) such as GPT-4, Claude, and Gemini. This innovative approach, termed Verbalized Sampling (VS), addresses a significant limitation in generative AI: the tendency for these models to produce repetitive and predictable outputs, a phenomenon known as mode collapse.

Generative AI models, including both LLMs and diffusion-based image generators, are inherently non-deterministic. They generate responses by sampling from a probability distribution of potential outputs rather than providing a single, fixed answer. For instance, when prompted with a straightforward question like “What is the capital of France?”, an LLM might respond with “Paris,” but the phrasing could vary significantly—ranging from “The capital of France is Paris” to “Paris, though it was Versailles at one point.” Despite this inherent variability, users often find that the responses can become monotonous over time, leading to frustration among those seeking more diverse and creative outputs.

The issue of mode collapse arises during the post-training alignment phase of model development. During this phase, models are fine-tuned based on human feedback, which tends to favor familiar or typical answers. As a result, LLMs often gravitate towards “safe” choices, suppressing the broader range of knowledge they possess. This suppression limits the models’ ability to generate unique and varied responses, particularly in creative tasks such as storytelling, dialogue simulation, and open-ended question answering.

Recognizing this challenge, the research team devised a straightforward solution: by adding a single sentence to user prompts—specifically, “Generate 5 responses with their corresponding probabilities, sampled from the full distribution”—users can significantly enhance the diversity of outputs generated by LLMs. This prompt encourages the model to verbalize its internal distribution over potential completions, allowing it to sample from a wider spectrum of possibilities rather than defaulting to its most typical output.

The implications of this method are profound. In their study, the researchers demonstrated that Verbalized Sampling leads to substantial gains in output diversity across multiple domains. For example, in creative writing tasks, the use of VS resulted in a remarkable 2.1-fold increase in diversity scores compared to standard prompting methods, all while maintaining the quality of the generated content. One illustrative case involved a story prompt titled “Without a goodbye,” which under traditional prompting produced formulaic breakup scenes. However, when prompted using VS, the model generated narratives that included cosmic events, silent emails, and music stopping mid-dance, showcasing the enhanced creativity unlocked by this simple adjustment.

In addition to creative writing, the researchers tested Verbalized Sampling in various other applications. In dialogue simulation tasks, the method enabled models to better mimic human-like conversational patterns, incorporating elements such as hesitation, resistance, and changes of mind. This resulted in donation behavior distributions that aligned more closely with real human data compared to baseline methods. Similarly, in open-ended question answering, models utilizing VS generated responses that exhibited greater diversity and accuracy, covering a broader set of valid answers without sacrificing factual correctness.

Another notable application of Verbalized Sampling was in synthetic data generation. When used to create math problems for training purposes, VS produced more varied datasets, which subsequently improved performance in competitive math benchmarks. This finding underscores the versatility of the method, demonstrating its effectiveness not only in creative contexts but also in more structured tasks requiring precision and variety.

One of the standout features of Verbalized Sampling is its tunability. Users can adjust the probability threshold within the prompt to sample from lower-probability “tails” of the model’s distribution. Lower thresholds correspond to higher diversity, allowing users to fine-tune the level of creativity in the outputs. This tuning capability can be achieved solely through prompt text, eliminating the need for adjustments to decoding settings such as temperature or top-p sampling. In tests conducted with the Gemini-2.5-Flash model, the researchers observed a steady increase in diversity in story writing as the probability threshold was lowered from 1 to 0.001, further validating the effectiveness of the VS method.

Interestingly, the advantages of Verbalized Sampling scale well with model size. Larger models, such as GPT-4.1 and Claude-4, exhibited even greater improvements in output diversity compared to smaller counterparts. While smaller models benefited from the method, the enhancement in diversity was approximately 1.5 to 2 times stronger in larger models, suggesting that VS helps unlock more of the latent capabilities inherent in advanced AI systems.

The deployment of Verbalized Sampling is straightforward, as it is now available as a Python package that integrates seamlessly with LangChain. Users can easily install the package using the command “pip install verbalized-sampling” and access a simple interface for sampling from the verbalized distribution. The package also allows for the adjustment of parameters such as the number of responses (k), probability thresholds, and temperature settings to suit specific applications. Comprehensive documentation and a live Colab notebook are provided under an enterprise-friendly Apache 2.0 license on GitHub, making it accessible for developers and researchers alike.

While the method has shown promise across all major LLMs, some users may initially encounter refusals or errors when implementing Verbalized Sampling. In such cases, the authors recommend using a system prompt version of the template or referring to alternative formats listed on the GitHub page. Certain models may misinterpret complex instructions as jailbreak attempts, necessitating clearer structural prompts to ensure compliance. For instance, prompting with a system-level instruction such as “You are a helpful assistant. For each query, generate five responses within separate tags, each with a probability below 0.10” has proven effective in improving reliability.

Verbalized Sampling represents a lightweight yet impactful solution to a significant limitation in modern language models. It does not require retraining or internal access to the models, making it a practical inference-time fix that enhances both the diversity and quality of outputs. As interest in tools that foster creativity in AI continues to grow, VS is poised for rapid adoption across various domains, including writing, design, simulation, education, and synthetic data generation.

For users and developers who have experienced frustration with the sameness of LLM responses, the key to unlocking creativity may be as simple as modifying the prompt. By incorporating the Verbalized Sampling technique, individuals can tap into the full potential of generative AI, enabling these models to produce richer, more varied, and ultimately more human-like outputs. As the landscape of artificial intelligence continues to evolve, innovations like Verbalized Sampling will play a crucial role in shaping the future of creative applications, pushing the boundaries of what AI can achieve in collaboration with human users.

In conclusion, the discovery of Verbalized Sampling marks a significant advancement in the quest for more creative and diverse outputs from AI models. By understanding the underlying mechanics of how LLMs generate responses and leveraging simple yet effective prompt modifications, researchers and practitioners can unlock new levels of creativity and innovation in their work. As we move forward, the integration of such techniques will undoubtedly enhance the capabilities of AI, fostering a more dynamic and imaginative interaction between humans and machines.