Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2308.12950

TRAMS: Training-free Memory Selection for Long-range Language Modeling

Paper • 2310.15494 • Published Oct 24, 2023 • 2
A Long Way to Go: Investigating Length Correlations in RLHF

Paper • 2310.03716 • Published Oct 5, 2023 • 10
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 77
Giraffe: Adventures in Expanding Context Lengths in LLMs

Paper • 2308.10882 • Published Aug 21, 2023 • 1

Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 29

synthetic code generation

Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5, 2024 • 98
Wukong: Towards a Scaling Law for Large-Scale Recommendation

Paper • 2403.02545 • Published Mar 4, 2024 • 17
StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 31
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models

Paper • 2308.10462 • Published Aug 21, 2023 • 2

There's usually interesting papers in the model cards on the leaderboard: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard

StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 31
WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Paper • 2306.08568 • Published Jun 14, 2023 • 32
SantaCoder: don't reach for the stars!

Paper • 2301.03988 • Published Jan 9, 2023 • 7
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 68

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

Paper • 2311.12198 • Published Nov 20, 2023 • 22
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

Paper • 2311.18775 • Published Nov 30, 2023 • 6
Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 29

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 24
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Masked Audio Generation using a Single Non-Autoregressive Transformer

Paper • 2401.04577 • Published Jan 9, 2024 • 43
Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 29
Simple and Controllable Music Generation

Paper • 2306.05284 • Published Jun 8, 2023 • 162
High Fidelity Neural Audio Compression

Paper • 2210.13438 • Published Oct 24, 2022 • 4

In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

Paper • 2402.10790 • Published Feb 16, 2024 • 42
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models

Paper • 2402.10524 • Published Feb 16, 2024 • 23
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Paper • 2410.16256 • Published Oct 21, 2024 • 60
Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 29

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 105
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 24
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 9
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 21

Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 29

TRAMS: Training-free Memory Selection for Long-range Language Modeling

Paper • 2310.15494 • Published Oct 24, 2023 • 2
A Long Way to Go: Investigating Length Correlations in RLHF

Paper • 2310.03716 • Published Oct 5, 2023 • 10
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 77
Giraffe: Adventures in Expanding Context Lengths in LLMs

Paper • 2308.10882 • Published Aug 21, 2023 • 1

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 24
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 29

Masked Audio Generation using a Single Non-Autoregressive Transformer

Paper • 2401.04577 • Published Jan 9, 2024 • 43
Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 29
Simple and Controllable Music Generation

Paper • 2306.05284 • Published Jun 8, 2023 • 162
High Fidelity Neural Audio Compression

Paper • 2210.13438 • Published Oct 24, 2022 • 4

synthetic code generation

Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5, 2024 • 98
Wukong: Towards a Scaling Law for Large-Scale Recommendation

Paper • 2403.02545 • Published Mar 4, 2024 • 17
StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 31
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models

Paper • 2308.10462 • Published Aug 21, 2023 • 2

In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

Paper • 2402.10790 • Published Feb 16, 2024 • 42
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models

Paper • 2402.10524 • Published Feb 16, 2024 • 23
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Paper • 2410.16256 • Published Oct 21, 2024 • 60
Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 29

There's usually interesting papers in the model cards on the leaderboard: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard

StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 31
WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Paper • 2306.08568 • Published Jun 14, 2023 • 32
SantaCoder: don't reach for the stars!

Paper • 2301.03988 • Published Jan 9, 2023 • 7
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 68

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 105
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 24
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 9
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 21

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

Paper • 2311.12198 • Published Nov 20, 2023 • 22
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

Paper • 2311.18775 • Published Nov 30, 2023 • 6
Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 29

Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 29

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs