Top-p in LLM | Meursault's Blog

Top-p is a hyperparameter that controls the diversity of word choice in generated text.

1. Top-p

Top-p value is usually set between 0 and 1.

Top-p is not a specific number that how much words can appear in the prediction step (like top-k), it is a probability.

When set Top-p to some value, such as 0.9, then:

At first, model will calculates the next token’s probability distribution.
Then sort the tokens in descending order.
Calculate the cumulative probability, adding each token’s probability to the sum until it reach the Top-p value.
The last step is select the candidate token only from the added tokens.

By setting the top value to 0.9, we will only consider the 90% most probable words as candidates.

So if we set a big Top-p value, the model will generate the sentence with more diverse words. If we set a small Top-p value, it will tend to generate the most statistically probable words, which will leads to a more simple and uniform sentence.