Cihangir Bozdogan
AI & ML interests
Recent Activity
Organizations
MTP
That curiosity is the best fuel there is. Looking forward to seeing CompactAI ๐
I wanted to take the time to thank each and every one of you for using my dataset and getting it to go as far as it did. Believe it or not, some neanderthal was and maybe still is trending on huggingface.
Not only did my dataset reach number one, my fine-tuned qwen3.5 model did as well. Top 10. Honestly, ain't much left to do here.
Y'all have given me the desire, no... the craving for more. I am absolutely obsessed with AI now. I want to tweak it... I want to take it apart, just to see what makes everything tick. I want to put it together like Frankenstein and his monster.
The only thing that's stopping this guy is compute. I don't mind spending every penny I have on this. I desperately want to drive AI forward, even just a little bit.
I never knew the clanker hater from a year ago would be saying this.
Thank you all from the bottom of my heart.
Looking forward to showing you what I'm cooking up next. @CompactAI is your only hint!
๐ Try it: FINAL-Bench/model-galaxy
This Space is a fork of the brilliant Eliahu/Model-Atlas, the official demo of "Charting and Navigating Hugging Face's Model Atlas" (Horwitz et al., arXiv 2503.10633). Their pre-computed HF model graph is the foundation of every node and edge you see, and we are deeply grateful for its open release.
The original atlas is a static snapshot of early 2025. Model Galaxy turns it into a living, multimodal map. We injected the 2026 trending originals that did not exist when the atlas was frozen โ DeepSeek-V4, Hy3-preview, GLM-5.1, Kimi-K2, gpt-oss, Nemotron-3 Super / Nano / Omni, Hermes-4.3, Qwen3-Coder-Next, Llama-3.3, Granite-4.1, plus the latest multimodal releases (FLUX.2, ERNIE-Image, HunyuanImage / Video, LTX-2.3, Wan2.2, Kokoro-82M, VoxCPM2, Voxtral-TTS, whisper-v3-turbo, Gemma-4, Qwen3-Omni, Phi-4-mm) โ each with proper base_model lineage edges.
We also added the complete VIDRAFT Darwin family ontology: 120 nodes covering Darwin Core, AETHER, every brand variant (Rogue, AWAXIS, TenOS, Warecube), NOESIS-Darwin multimodal extensions, and 40+ community quantizations โ the most complete Darwin lineage view anywhere.
The name "Galaxy" is now literal: our three injected clusters are re-laid out as logarithmic spiral galaxies, with bigger models near the bright cores and quantizations scattering to the outer arms โ just like real star mass distribution. A top-right toggle switches between Galaxy mode (deep-space gradient with 220 animated stars) and Atlas mode (clean white panels for reports). A 15-second progress bar narrates the render, and per-modality / per-company colors make every cluster legible at a glance.
Final scale: 22,480 nodes in the default Modalities atlas, 137,324 in the Large NLP atlas, and a 277-node compact Darwin + Trending view for instant exploration. Feedback and PRs welcome.
Tuned 27B Heretic Uncensored quants from IQ2M to Q8.
IQ2M is 83% of BF16, with Q6 just under 98% of BF16 precision.
Q8: 98.47% of BF16 precision.
NEO/Code DI-Imatrix Quants.
Exceeds all 5 metrics for "censored" quants too.
All metrics posted.
Tuned model -from which the quants were built- also exceeds Qwen 3.6 27B core metrics too.
DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF
...The day I forgot to attach wandb.ai
Just dropped Crowfeather-50m, the first checkpoint in a series, and yeah, no graphs.
Crowfeather/Crowfeather-50m
54.5M params. Pretrain only. 17,500 steps banked on FineWeb-edu before Thunder credits ran dry. About 2.3B tokens, no SFT yet.
Architecture: Gemma-4 alternating sliding/global attention (1024 window, last layer always global) plus DeepSeek-V4 Muon optimizer plus WSD scheduler plus Gemma-2 logit soft-cap plus PaLM z-loss. Recipe in the model card.
What it can do: writes grammatical English. Knows that France has Rhine-adjacent monasteries (it picked Rouen instead of Paris but the vocabulary is in there). Tells stories about Mr. Fabien.
What it can't do yet: facts, code, math. Base LM, no SFT, no instruction tuning.
The series:
Every additional training run becomes another model card here
Every model card gets a matching post on this profile
Continuation goes to Colab next, picking up from step 17500 out of 100k
Limited to one post a day on Hugging Face, so updates will trickle out at that pace. Follow [@Crownelius](@Crownelius ) and [@Crowfeather](
Graphs will be available on my NEXT model lol
-Shane
What stands out: all 16 libraries converged on the same disaggregated architecture, but diverged sharply on staleness management and the hybrid (depth bounding + optional IS correction) trend feels right. Per-sample model_version tagging is the pragmatic foundation; once you have it, every other staleness strategy becomes a policy choice rather than an architectural rewrite. The MoE training-inference mismatch is the sleeper insight "Keep Routing" and "Keep Sampling Mask" stop being optimizations and become correctness requirements. Excellent survey.
Love the design choice of a single look_and_answer tool and letting Gemma 4 decide when vision is actually needed โ much cleaner than always-on vision encoders. And going native llama.cpp over Docker is the right call if you want to actually swap the mmproj. A 5B Q4_K_M doing tool-routed multimodal reasoning on 8GB unified memory, with Parakeet + Kokoro on-device, is a strong signal for where edge agents are heading. Nice work.