Collections
Discover the best community collections!
Collections including paper arxiv:2310.01377
-
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 89 -
Natural Language Reinforcement Learning
Paper • 2411.14251 • Published • 31 -
Group Robust Preference Optimization in Reward-free RLHF
Paper • 2405.20304 • Published • 1
-
HyperCLOVA X Technical Report
Paper • 2404.01954 • Published • 25 -
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Paper • 2305.14387 • Published • 1 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 137
-
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 89 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 123
-
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Paper • 2305.11738 • Published • 8 -
Shepherd: A Critic for Language Model Generation
Paper • 2308.04592 • Published • 32 -
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Paper • 2402.14809 • Published • 3 -
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic
Paper • 2401.07382 • Published • 2
-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 149 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 88 -
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Paper • 2305.07759 • Published • 36 -
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper • 2406.20094 • Published • 104
-
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 13 -
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
Paper • 2404.00987 • Published • 23 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 22
-
Fine-Tuning Language Models from Human Preferences
Paper • 1909.08593 • Published • 3 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
Leverage the Average: an Analysis of KL Regularization in RL
Paper • 2003.14089 • Published • 2 -
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Paper • 2404.01258 • Published • 12
-
Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test
Paper • 2309.13356 • Published • 37 -
Unveiling Safety Vulnerabilities of Large Language Models
Paper • 2311.04124 • Published • 10 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69 -
Evaluating Frontier Models for Dangerous Capabilities
Paper • 2403.13793 • Published • 7
-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 149 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 88 -
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Paper • 2305.07759 • Published • 36 -
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper • 2406.20094 • Published • 104
-
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 89 -
Natural Language Reinforcement Learning
Paper • 2411.14251 • Published • 31 -
Group Robust Preference Optimization in Reward-free RLHF
Paper • 2405.20304 • Published • 1
-
HyperCLOVA X Technical Report
Paper • 2404.01954 • Published • 25 -
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Paper • 2305.14387 • Published • 1 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 137
-
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 13 -
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
Paper • 2404.00987 • Published • 23 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 22
-
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 89 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 123
-
Fine-Tuning Language Models from Human Preferences
Paper • 1909.08593 • Published • 3 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
Leverage the Average: an Analysis of KL Regularization in RL
Paper • 2003.14089 • Published • 2 -
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Paper • 2404.01258 • Published • 12
-
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Paper • 2305.11738 • Published • 8 -
Shepherd: A Critic for Language Model Generation
Paper • 2308.04592 • Published • 32 -
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Paper • 2402.14809 • Published • 3 -
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic
Paper • 2401.07382 • Published • 2
-
Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test
Paper • 2309.13356 • Published • 37 -
Unveiling Safety Vulnerabilities of Large Language Models
Paper • 2311.04124 • Published • 10 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69 -
Evaluating Frontier Models for Dangerous Capabilities
Paper • 2403.13793 • Published • 7