Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2408.12637

Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 19
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 44
Fine-tuning Language Models for Factuality

Paper • 2311.08401 • Published Nov 14, 2023 • 30
Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 77

The Evolution of Multimodal Model Architectures

Paper • 2405.17927 • Published May 28, 2024 • 1
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 103
Efficient Architectures for High Resolution Vision-Language Models

Paper • 2501.02584 • Published Jan 5
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133

A Single Transformer for Scalable Vision-Language Modeling

Paper • 2407.06438 • Published Jul 8, 2024 • 1
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 259
MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4 • 157

Learn: Vision Language Models

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 103
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 144
Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 140
Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published Sep 4, 2024 • 72
Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5, 2024 • 92

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

Paper • 2507.01957 • Published Jul 2 • 21
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

Paper • 2508.03694 • Published Aug 5 • 50
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133

Exploring the Potential of Encoder-free Architectures in 3D LMMs

Paper • 2502.09620 • Published Feb 13 • 26
The Evolution of Multimodal Model Architectures

Paper • 2405.17927 • Published May 28, 2024 • 1
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 103
Efficient Architectures for High Resolution Vision-Language Models

Paper • 2501.02584 • Published Jan 5

Vision-Language Models

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133

MIT Talk 31/10 Papers

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17, 2024 • 74
BRAVE: Broadening the visual encoding of vision-language models

Paper • 2404.07204 • Published Apr 10, 2024 • 19
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27, 2024 • 47
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 121

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133

Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 19
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 44
Fine-tuning Language Models for Factuality

Paper • 2311.08401 • Published Nov 14, 2023 • 30
Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 77

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

Paper • 2507.01957 • Published Jul 2 • 21
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

Paper • 2508.03694 • Published Aug 5 • 50
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133

The Evolution of Multimodal Model Architectures

Paper • 2405.17927 • Published May 28, 2024 • 1
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 103
Efficient Architectures for High Resolution Vision-Language Models

Paper • 2501.02584 • Published Jan 5
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133

Exploring the Potential of Encoder-free Architectures in 3D LMMs

Paper • 2502.09620 • Published Feb 13 • 26
The Evolution of Multimodal Model Architectures

Paper • 2405.17927 • Published May 28, 2024 • 1
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 103
Efficient Architectures for High Resolution Vision-Language Models

Paper • 2501.02584 • Published Jan 5

A Single Transformer for Scalable Vision-Language Modeling

Paper • 2407.06438 • Published Jul 8, 2024 • 1
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 259
MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4 • 157

Vision-Language Models

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133

Learn: Vision Language Models

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 103
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133

MIT Talk 31/10 Papers

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17, 2024 • 74
BRAVE: Broadening the visual encoding of vision-language models

Paper • 2404.07204 • Published Apr 10, 2024 • 19
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27, 2024 • 47
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 121

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 144
Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 140
Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published Sep 4, 2024 • 72
Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5, 2024 • 92

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133

Previous
1
2
3
...
5
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs