Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
grimjim 
posted an update 21 days ago
Post
5011
Implemented a proof of concept sampler in pure PyTorch and transformers.

Max P consists of a dynamic token filter which applies Winsorization to cap the probabilties of top tokens. Specifically, a base probability in the range of [0,1] is used to cap individual token probability; the sampler then redistributes excess proportionally.

https://github.com/jim-plus/maxp-sampler-poc

Combined with Temperature and Min P, this could represent a more intuitive way of reducing repetition in text generation.

I think your idea has similar effects as:

  • Top K because probability is capped and redistributed. When the top tokens have the same high probability, it's like top K token selection.
  • High temperature because the distribution of low vs. high probability tokens are more similar (flatter distribution)

It should increase diversity but I'm not sure whether it can decrease repetition.

·

The hope is that nudging probabilities breaks up longer spans of literal repetition.

In this post