Temperature

A parameter that controls how random or deterministic a language model's output is — lower values produce more predictable responses, higher values produce more creative ones.

Temperature is a single number — typically between 0 and 2 — that controls how random an LLM's output is. At temperature 0, the model always picks the highest-probability next token: deterministic, consistent, repeatable. Increase it and the model samples more broadly across possible tokens, producing more varied and less predictable responses.

The rule of thumb is simple. Low temperature for tasks where there's a right answer: extraction, classification, structured data, code generation, contract summaries. Higher temperature for tasks where variety is the point: brainstorming, creative copy, generating a list of diverse options. Above 1.0 is rarely useful in production — output starts getting incoherent in ways that are hard to predict and harder to debug.

This is one of the easiest parameters to get wrong. Teams refine a solid prompt, test it manually at the API default, ship it, then file bugs when outputs are inconsistent across identical inputs. The fix is almost always the same: set temperature explicitly. Don't leave it at whatever the provider defaults to. That default is a compromise designed for no one in particular.

If your task is deterministic, set temperature to 0 and stop guessing. If your task benefits from variety, experiment in a range of 0.7–1.0 and eval the outputs rather than eyeballing a few examples. Temperature isn't a magic creativity dial — it's a sampling parameter, and like every parameter, it should be set deliberately.

Related Terms