@grimjim on Hugging Face: "Implemented a proof of concept sampler in pure PyTorch and transformers. Max…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update Nov 16, 2025

Post

5088

Implemented a proof of concept sampler in pure PyTorch and transformers.

Max P consists of a dynamic token filter which applies Winsorization to cap the probabilties of top tokens. Specifically, a base probability in the range of [0,1] is used to cap individual token probability; the sampler then redistributes excess proportionally.

https://github.com/jim-plus/maxp-sampler-poc

Combined with Temperature and Min P, this could represent a more intuitive way of reducing repetition in text generation.

agentlans

Nov 17, 2025

I think your idea has similar effects as:

Top K because probability is capped and redistributed. When the top tokens have the same high probability, it's like top K token selection.
High temperature because the distribution of low vs. high probability tokens are more similar (flatter distribution)

It should increase diversity but I'm not sure whether it can decrease repetition.

grimjim

Nov 18, 2025

The hope is that nudging probabilities breaks up longer spans of literal repetition.

In this post