Pretrained models from the paper "Predicting the Order of Upcoming Tokens Improves Language Modeling"
Zayd Muhammad Kawakibi Zuhri PRO
zaydzuhri
AI & ML interests
I really like watching loss go down
Recent Activity
updated
a model
about 3 hours ago
zaydzuhri/top-1B-4096-ratio090-window16-batch8x2-steps200000-20260128-172716
published
a model
about 3 hours ago
zaydzuhri/top-1B-4096-ratio090-window16-batch8x2-steps200000-20260128-172716
updated
a model
about 3 hours ago
zaydzuhri/top-340M-optimal-4096-model
Organizations
None yet