Instructions to use answerdotai/ModernBERT-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use answerdotai/ModernBERT-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="answerdotai/ModernBERT-base")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base") model = AutoModelForMaskedLM.from_pretrained("answerdotai/ModernBERT-base") - Notebooks
- Google Colab
- Kaggle
Micro-batch size of 96
Hello,
Thanks for the nice work, it is a really impressive model π.
I've implemented a training script using accelerate and H100 cards (94GB version). Everything is working well, even the batch-size warmup.
However, I am saturating my GPUs with micro-batch sizes of 48 sequences. In the ModernBERT paper, I see that that you are setting the micro-batch size at 96 π. What am I missing ? I am using FA2 like you. Even if I am not saturating my unpadded sequence as you are doing in the original paper, there is not way I can fit 96*1024 tokens in a micro-batch...
Julien
Update !
I was able to increase my micro-batch size to 88 by using gradient checkpointing.
I'll get back to you once I am able to squeeze the last 8 in the GPU.
Julien
Using ZeRO-2, I am able to fit 96 sequences in one GPU. I could combine it with gradient checkpointing to fit even more sequences.
Closing the comment.