6 25

Young Sik Hong

RICHARDYHONG

AI & ML interests

None yet

Recent Activity

liked a model 7 days ago

Anserwise/AWAXIS-Think-28B

liked a model 9 days ago

FINAL-Bench/Darwin-9B-MFP4

liked a model 9 days ago

FINAL-Bench/Darwin-28B-Opus

View all activity

Organizations

None yet

liked a model 7 days ago

Anserwise/AWAXIS-Think-28B

Text Generation • 28B • Updated 10 days ago • 728 • 15

liked 2 models 9 days ago

FINAL-Bench/Darwin-9B-MFP4

Text Generation • 7B • Updated 9 days ago • 432 • 13

FINAL-Bench/Darwin-28B-Opus

Text Generation • 28B • Updated 9 days ago • 683 • 22

reacted to SeaWolf-AI's post with ❤️🤗🔥 9 days ago

Post

8689

🧬 Introducing Darwin-9B-NEG — the first model with Native Entropy Gating (NEG)

🔗 Try it now: FINAL-Bench/Darwin-9B-NEG
🔗 Q4 bit : FINAL-Bench/Darwin-9B-MFP4

We're thrilled to release Darwin-9B-NEG, a 9B-parameter reasoning model
that embeds an architecturally-internalised sense of self-confidence directly
into the transformer — our proprietary Native Entropy Gating (NEG) technology.

📊 GPQA Diamond (198 PhD-level questions):

▸ Baseline Darwin-9B (no NEG) → 51.01 %
▸ Pure NEG (greedy · 1× cost) → 63.64 % 🔥 +12.63 %p
▸ + Permutation (4× cost) → 76.26 %
▸ + Ensemble Refinement (~20×) → 84.34 % 🏆

With only 9 billion parameters and 1× inference cost, Pure NEG jumps
+12.63 %p over the same model without NEG. Going all-in with ensemble
refinement pushes it to 84.34 % — surpassing the published Qwen3.5-9B
leaderboard score (81.7 %) by +2.64 %p.

🔬 What makes NEG different from Multi-Turn Iteration (MTI)?

Classical MTI needs 3-8× extra inference passes. NEG instead lives
INSIDE the single decoding loop. Two tiny modules ride with the
transformer: NEG-Head predicts per-token entropy from the last hidden
state, and NEG-Gate conditionally restricts the top-k choice when
confidence is low. The gate activates in only 4.36 % of tokens —
essentially free at inference time.

✨ Key differentiators
• Architecturally internalised — model file *is* the feature
• 1× inference cost (vs. 3-8× for MTI)
• Drop-in with vLLM / SGLang / TGI / transformers — no extra engine
• +12.63 %p reasoning at zero latency overhead
• Single-file deployment, Apache 2.0 licensed

🧬 Lineage
Qwen/Qwen3.5-9B → Darwin-9B-Opus (V7 evolutionary merge) → Darwin-9B-NEG (V8 + NEG training)

#Darwin #NEG #NativeEntropyGating #GPQA #Reasoning #LLM #OpenSource #Apache2

liked a model 9 days ago

FINAL-Bench/Darwin-9B-NEG

Text Generation • 10B • Updated 9 days ago • 1.19k • 39

liked a model 11 days ago

FINAL-Bench/Darwin-36B-Opus

Text Generation • 35B • Updated 11 days ago • 1.04k • 51

reacted to SeaWolf-AI's post with 👍❤️🔥 19 days ago

Post

4447

Darwin-TTS: 3% of an LLM's Brain Makes TTS Speak with Emotion — Zero Training

We blended 3% of Qwen3-1.7B (LLM) FFN weights into Qwen3-TTS-1.7B's talker module. The result: emotionally enhanced speech synthesis — with zero training, zero data, and zero GPU hours.

Try the Demo: FINAL-Bench/Darwin-TTS-1.7B-Cross

Model Weights: FINAL-Bench/Darwin-TTS-1.7B-Cross

Full Research Article: https://huggingface.co/blog/FINAL-Bench/darwin-tts

Qwen3-1.7B (LLM) and Qwen3-TTS-1.7B's talker share 100% identical architecture — same hidden_size (2048), same layers (28), same heads (16). This enabled pure 1:1 weight blending across 84 FFN tensors with a single lerp operation. At 3% blend, emotion appears. At 5%, emotion intensifies. At 10%, the model breaks — producing 655-second outputs for a 3-second sentence, because the LLM's "keep generating" pattern overwhelms the TTS stop signal.

To our knowledge, this is the first training-free cross-modal weight transfer between an LLM and a TTS model. Prior work either requires adapter training (SmolTolk, 2025), fine-tuning (CSLM, 2025), or massive end-to-end compute (GPT-4o). Darwin-TTS achieves cross-modal capability transfer in under 2 minutes on CPU.

The key insight: TTS models with LLM backbones already "think" in language. We're just restoring 3% of the original LLM's language understanding patterns — particularly those related to emotional semantics and prosody planning. The code is three lines: load the model, load the LLM FFN, call p.lerp_(llm_weight, 0.03).

creators of the Darwin Evolutionary Merge Framework.
Darwin LLM V7 achieved GPQA Diamond 86.9% (HF Benchmark #3)
through CMA-ES optimized FFN crossbreeding. Darwin-TTS extends this principle from LLM-to-LLM merging into cross-modal LLM-to-TTS transfer. Apache 2.0.

upvoted an article 19 days ago

Article

Darwin-TTS: We Gave a TTS Model 3% of an LLM's Brain — It Started Showing Emotion

19 days ago

•

liked a Space 19 days ago

Darwin TTS 1.7B Cross

🦀

Darwin-TTS-1.7B-Cross

liked a model 19 days ago

FINAL-Bench/Darwin-TTS-1.7B-Cross

Text-to-Speech • Updated 19 days ago • 205 • 29

liked a Space 20 days ago

PROMETHEUS v1.0 — World Model Interactive Demo

🔥

World-first embodied AI world model

liked 3 Spaces 21 days ago

Leaderboard - FINAL Bench 'Metacognitive'

🚀

Metacognitive

Invisible Watermark Against Unauthorized AI Training — Text, Image & Video Protection

⚡

One embed. Four invisible layers. 34 attacks defeated.

Prompt & Dump - AI NPC Trading Arena

🎪

Autonomous AI Leverage Trading Simulation

reacted to SeaWolf-AI's post with 🔥👍 21 days ago

Post

5911

🧬 Darwin-27B-Opus: 86.9% on GPQA Diamond — World #5, Zero Training
We are excited to share Darwin-27B-Opus, a 27B model that achieved 86.9% on GPQA Diamond — ranking #5 globally on the HuggingFace leaderboard — without a single gradient update.

How? Darwin breeds pretrained models through evolutionary FFN crossbreeding. The father (Qwen3.5-27B) provides the reasoning architecture; the mother (Claude 4.6 Opus Reasoning Distilled) contributes structured chain-of-thought knowledge. CMA-ES automatically discovers optimal per-layer blending ratios — no human tuning required.

The result surpasses the original Qwen3.5-27B (85.5%), GLM-5.1 (744B, 86.2%), and Qwen3.5-122B (86.6%). A 27B model outperforming 744B — with zero training, zero data, one GPU, ~2 hours.

We also confirmed hybrid vigor on Korean benchmarks: Darwin-27B-KR (2nd generation offspring) surpassed both parents on CLIcK, winning 7 out of 11 categories. The evolutionary optimizer independently assigned 93% of FFN from the Korean-specialized mother while preserving 93% of attention from the reasoning-specialized father — autonomously validating our core principle: FFN carries knowledge, Attention carries reasoning.

📊 Public release: 10 days → 300+ community derivatives, 120K+ downloads.

🔗 Links:
Darwin-27B-Opus: FINAL-Bench/Darwin-27B-Opus
article: https://huggingface.co/blog/FINAL-Bench/darwin-gpqa
Darwin Family Collection: https://huggingface.co/collections/FINAL-Bench/darwin-family

If foundation models are raw ore, Darwin is the forge. We are just getting started. 🔥

Young Sik Hong

AI & ML interests

Recent Activity

Organizations

RICHARDYHONG's activity

Darwin-TTS: We Gave a TTS Model 3% of an LLM's Brain — It Started Showing Emotion

Darwin TTS 1.7B Cross

PROMETHEUS v1.0 — World Model Interactive Demo

Leaderboard - FINAL Bench 'Metacognitive'

Invisible Watermark Against Unauthorized AI Training — Text, Image & Video Protection

Prompt & Dump - AI NPC Trading Arena