AI & ML interests

None defined yet.

Recent Activity

Update README.md

#2 opened 10 days ago by
hypothetical
hypothetical 
posted an update 11 days ago
view post
Post
2573
We thought it would be easier, but finally we have integrated CuDNN Paged Attention to our models!


Read article here: https://app.thestage.ai/blog/Integrating-cuDNN-Paged-Attention-to-TheStage-AI-Inference?id=8

Llama-8B with CuDNN paged attention, including B200 support: TheStageAI/Elastic-Llama-3.1-8B-Instruct
Mistral-Small-24B with CuDNN paged attention, including B200 support: TheStageAI/Elastic-Mistral-Small-3.1-24B-Instruct-2503
hypothetical 
posted an update 18 days ago
view post
Post
2018
We have updated our transcription model: TheStageAI/thewhisper-large-v3-turbo

– 6.00 WER on the English Open ASR Leaderboard
– 4.74 WER on the Multilingual Open ASR Leaderboard
– Beats NVIDIA Parakeet (6.34 WER) and Whisper-large-v3-turbo (7.8 WER)
– Strong improvements in Arabic, Hindi, Chinese
– Maintains quality with background and environmental noise
– Optimized inference engines for NVIDIA and Apple
– Hugging Face Transformers interface for easy use
– Best-in-class speed on NVIDIA GPUs and power efficiency on Apple devices
– NVIDIA Jetson Thor support
  • 2 replies
·
hypothetical 
posted an update about 2 months ago
view post
Post
266
Hello guys! Maybe someone want to test our framework for automated model's compression. Here is what can be produced with it. Move the slider - compress/accelerate model, select point which like and compile. I can give an access, we are now improving and collecting comments from users

TheStageAI/ANNA-LLM
  • 3 replies
·