Dhara Foundational Models Collection Diffusion Language Models combining deep narrow networks, Canon layers (depthwise causal convolutions), and WSD (Warmup-Stable-Decay) training. • 2 items • Updated 8 days ago • 3
symbolic Collection small models aiming at symbolic reasoning with fewshot prompts • 1 item • Updated 3 days ago • 3