The Term “Non-Deterministic” and LLMs
I have recently found myself using the term “non-deterministic” to describe LLM behavior. However, something feels off about using that term and I’m nearly convinced that not only is it (sometimes) incorrect, it is imprecise, as it leaves unexplained a critical charactericistic of LLM behavior that makes LLMs different from deterministic functions.
First, defining “deterministic algorithm” (Wikipedia):
In computer science, a deterministic algorithm is an algorithm that, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states.
LLMs can be deterministic (i.e. temperature = 0, do_sample=False
). For example running the following code passes all 100 assertions:
from transformers import AutoModelForCausalLM, AutoTokenizer
= "HuggingFaceTB/SmolLM2-135M"
model_id = AutoTokenizer.from_pretrained(model_id)
tokenizer = AutoModelForCausalLM.from_pretrained(model_id)
model
= "The best thing about artificial intelligence is "
prompt = tokenizer(prompt, return_tensors="pt")
inputs = inputs["attention_mask"]
attention_mask
= []
texts for _ in range(100):
= model.generate(
outputs 'input_ids'],
inputs[=attention_mask,
attention_mask=tokenizer.eos_token_id
pad_token_id
)
= tokenizer.decode(outputs[0], skip_special_tokens=True)
text
texts.append(text)
for text in texts: assert text == "The best thing about artificial intelligence is that it can be used to solve problems that would otherwise be impossible to solve.\n\nFor"
What I think people mean by saying “LLMs are non-deterministic” is something like the following from the Steering Semantic Data Processing With DocWrangler paper by Shreya Shankar, et al:
users need to understand their data to write effective pipelines, yet they need to construct pipelines to extract the data necessary for that understanding
Thinking on that a bit more, what I think people mean by saying “LLMs are non-deterministic” is: what inputs to give LLMs for a desired output is ambiguous. Prompt engineering being a thing is a great example of this. I don’t know enough mathematics to know if there’s a term for this. “Input ambiguous”? “Non-deterministic on both ends”? The best Sonnet 4 came up with was “non-invertible” (other option was “non-transparent”). GPT-5 Thinking came up with a more sophisticated response “Prompting LLMs is an ill-posed inverse problem.”
- Inverse problem: you start from a desired output and try to find an input (prompt) that yields it.
- Ill-posed (Hadamard): the inverse fails one or more of
- existence (your target may be unreachable),
- uniqueness (many prompts produce similar outputs → non-injective),
- stability (tiny prompt tweaks swing the output a lot).
Separately, decoding can be stochastic (temperature/top-p), which is where “non-deterministic” actually applies. With temperature=0 and deterministic kernels, the model is deterministic—but the inverse remains ill-posed.
A well-posed problem (Wiki):
In mathematics, a well-posed problem is one for which the following properties hold:
- The problem has a solution
- The solution is unique
- The solution’s behavior changes continuously with the initial conditions.
Problems we try to solve with LLMs often fail all three properties, but again, I don’t know enough about mathematics to know if this truly applies to LLMs.
Most of my interactions with LLMs are through Claude Projects for coding assistance, and I make sure I understand the code (and that it works) before using it, so input ambiguity is acceptable. As I learn to use LLMs to build pipelines the input ambiguity problem sharpens and quickly makes my pipeline brittle. Over the next couple weeks, I plan on learning more about DocWrangler and DSPy to better understand how to temper my pipeline.