Papers - a pangzs Collection

pangzs 's Collections

Papers

In-context learning

Papers

updated Aug 21, 2025

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8, 2025 • 185
On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7, 2025 • 82
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

Paper • 2505.05467 • Published May 8, 2025 • 13
Adapting Vision-Language Models Without Labels: A Comprehensive Survey

Paper • 2508.05547 • Published Aug 7, 2025 • 11
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models

Paper • 2508.02095 • Published Aug 4, 2025 • 9
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6, 2025 • 129
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MedSAMix: A Training-Free Model Merging Approach for Medical Image Segmentation

Paper • 2508.11032 • Published Aug 14, 2025 • 2
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Paper • 2508.09736 • Published Aug 13, 2025 • 57
Ovis2.5 Technical Report

Paper • 2508.11737 • Published Aug 15, 2025 • 111