-
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 28 -
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Paper • 2406.02900 • Published • 13 -
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Paper • 2406.04151 • Published • 24 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 10
Yuquan Xie
xieyuquan
AI & ML interests
LLM, multi-modal
Organizations
arch
-
TroL: Traversal of Layers for Large Language and Vision Models
Paper • 2406.12246 • Published • 36 -
A Closer Look into Mixture-of-Experts in Large Language Models
Paper • 2406.18219 • Published • 17 -
ThinK: Thinner Key Cache by Query-Driven Pruning
Paper • 2407.21018 • Published • 32 -
Meltemi: The first open Large Language Model for Greek
Paper • 2407.20743 • Published • 68
learning
-
Law of Vision Representation in MLLMs
Paper • 2408.16357 • Published • 95 -
CogVLM2: Visual Language Models for Image and Video Understanding
Paper • 2408.16500 • Published • 57 -
Learning to Move Like Professional Counter-Strike Players
Paper • 2408.13934 • Published • 23 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133
compression
dpo
-
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 41 -
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM
Paper • 2406.12168 • Published • 7 -
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper • 2406.11827 • Published • 17 -
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Paper • 2406.18629 • Published • 42
rlhf
-
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 28 -
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Paper • 2406.02900 • Published • 13 -
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Paper • 2406.04151 • Published • 24 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 10
compression
arch
-
TroL: Traversal of Layers for Large Language and Vision Models
Paper • 2406.12246 • Published • 36 -
A Closer Look into Mixture-of-Experts in Large Language Models
Paper • 2406.18219 • Published • 17 -
ThinK: Thinner Key Cache by Query-Driven Pruning
Paper • 2407.21018 • Published • 32 -
Meltemi: The first open Large Language Model for Greek
Paper • 2407.20743 • Published • 68
dpo
-
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 41 -
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM
Paper • 2406.12168 • Published • 7 -
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper • 2406.11827 • Published • 17 -
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Paper • 2406.18629 • Published • 42
learning
-
Law of Vision Representation in MLLMs
Paper • 2408.16357 • Published • 95 -
CogVLM2: Visual Language Models for Image and Video Understanding
Paper • 2408.16500 • Published • 57 -
Learning to Move Like Professional Counter-Strike Players
Paper • 2408.13934 • Published • 23 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133