-
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper • 2310.15494 • Published • 2 -
A Long Way to Go: Investigating Length Correlations in RLHF
Paper • 2310.03716 • Published • 10 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 77 -
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper • 2308.10882 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2308.12950
-
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper • 2403.03163 • Published • 98 -
Wukong: Towards a Scaling Law for Large-Scale Recommendation
Paper • 2403.02545 • Published • 17 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 31 -
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models
Paper • 2308.10462 • Published • 2
-
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 31 -
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Paper • 2306.08568 • Published • 32 -
SantaCoder: don't reach for the stars!
Paper • 2301.03988 • Published • 7 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 68
-
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Paper • 2311.12793 • Published • 18 -
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Paper • 2311.12198 • Published • 22 -
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Paper • 2311.18775 • Published • 6 -
Code Llama: Open Foundation Models for Code
Paper • 2308.12950 • Published • 29
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 43 -
Code Llama: Open Foundation Models for Code
Paper • 2308.12950 • Published • 29 -
Simple and Controllable Music Generation
Paper • 2306.05284 • Published • 162 -
High Fidelity Neural Audio Compression
Paper • 2210.13438 • Published • 4
-
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 42 -
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Paper • 2402.10524 • Published • 23 -
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
Paper • 2410.16256 • Published • 60 -
Code Llama: Open Foundation Models for Code
Paper • 2308.12950 • Published • 29
-
Attention Is All You Need
Paper • 1706.03762 • Published • 105 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
-
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper • 2310.15494 • Published • 2 -
A Long Way to Go: Investigating Length Correlations in RLHF
Paper • 2310.03716 • Published • 10 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 77 -
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper • 2308.10882 • Published • 1
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 43 -
Code Llama: Open Foundation Models for Code
Paper • 2308.12950 • Published • 29 -
Simple and Controllable Music Generation
Paper • 2306.05284 • Published • 162 -
High Fidelity Neural Audio Compression
Paper • 2210.13438 • Published • 4
-
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper • 2403.03163 • Published • 98 -
Wukong: Towards a Scaling Law for Large-Scale Recommendation
Paper • 2403.02545 • Published • 17 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 31 -
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models
Paper • 2308.10462 • Published • 2
-
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 42 -
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Paper • 2402.10524 • Published • 23 -
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
Paper • 2410.16256 • Published • 60 -
Code Llama: Open Foundation Models for Code
Paper • 2308.12950 • Published • 29
-
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 31 -
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Paper • 2306.08568 • Published • 32 -
SantaCoder: don't reach for the stars!
Paper • 2301.03988 • Published • 7 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 68
-
Attention Is All You Need
Paper • 1706.03762 • Published • 105 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
-
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Paper • 2311.12793 • Published • 18 -
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Paper • 2311.12198 • Published • 22 -
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Paper • 2311.18775 • Published • 6 -
Code Llama: Open Foundation Models for Code
Paper • 2308.12950 • Published • 29