GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling
Paper • 2604.18556 • Published • 2
None defined yet.
MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning
MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization