Top-p is a hyperparameter that controls the diversity of word choice in generated text.
1. Top-p
Top-p value is usually set between 0
and 1
.
Top-p is not a specific number that how much words can appear in the prediction step (like top-k), it is a probability.
When set Top-p to some value, such as 0.9
, then:
- At first, model will calculates the next token’s probability distribution.
- Then sort the tokens in descending order.
- Calculate the cumulative probability, adding each token’s probability to the sum until it reach the Top-p value.
- The last step is select the candidate token only from the added tokens.
By setting the top value to 0.9, we will only consider the 90% most probable words as candidates.
So if we set a big Top-p value, the model will generate the sentence with more diverse words. If we set a small Top-p value, it will tend to generate the most statistically probable words, which will leads to a more simple and uniform sentence.