AGILLM-4 dblock single-file
This repo packages the live AGILLM-4 dblock trainer as one runnable Python file:
agillm4_dblock_single_file.py
It was regenerated on 2026-05-31T16:07:54Z by mechanically inlining the live VastAI training sources:
fused_ce.pyanchor_memory.pydblocks_train.pynB300_agillm4.py
The original live command uses nB300_agillm4.py train. This single-file build keeps that CLI surface, registers in-memory shims for the former helper modules, and disables helper-module smoke tests that would otherwise fire because the packed file is __main__.
See single_file_manifest.json for source hashes from the generated build.
Example training shape:
python agillm4_dblock_single_file.py train --preset agillm4_floor --dblock ...
This is experimental training code, not a polished inference package.
Inference Smoke Test
Validated on the live VastAI training box against /workspace/agillm4_4090_ckpts/pretrain_step01176781.pt using CPU-only AR inference:
CUDA_VISIBLE_DEVICES= python agillm4_dblock_single_file.py infer \
--mode ar \
--ckpt /workspace/agillm4_4090_ckpts/pretrain_step01176781.pt \
--prompt "User: Say hello in one short sentence. Assistant:" \
--max_new 8 --greedy --plain-output --attn_backend manual
The trainer zero-fills missing SAT/NAT bias keys during inference compatibility loading, which lets older full checkpoints run without leaving newly introduced bias tensors random.
NAT Decode Notes
The packed trainer includes the same NAT inference anti-collapse changes as the live trainer. NAT now applies repetition/frequency/presence penalties and sampler controls while committing masked positions, rather than filling every blank with an unconstrained argmax.
Smoke result on , CPU-only, : about 67 tok/s and no all-token collapse. Output quality is still early-training rough; this is a decoding stability improvement, not a solved NAT head.