16 29 26

Zhaokai Wang

wzk1015

https://www.wzk.plus

wzk1015

AI & ML interests

Computer Vision Music Generation Multimodal Large Language Models

Recent Activity

upvoted a paper 1 day ago

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

liked a model about 2 months ago

Zhenxin-Lei/MetaCaptioner

upvoted a paper about 2 months ago

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

View all activity

Organizations

upvoted a paper 1 day ago

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Paper • 2512.05965 • Published 4 days ago • 33

liked a model about 2 months ago

Zhenxin-Lei/MetaCaptioner

Updated Oct 23 • 7 • 1

upvoted a paper about 2 months ago

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17 • 89

upvoted 3 papers 2 months ago

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints

Paper • 2510.08565 • Published Oct 9 • 19

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9 • 109

Factuality Matters: When Image Generation and Editing Meet Structured Visuals

Paper • 2510.05091 • Published Oct 6 • 19

updated a dataset 2 months ago

OpenGVLab/GenExam

Updated Oct 6 • 229 • 3

authored a paper 3 months ago

GenExam: A Multidisciplinary Text-to-Image Exam

Paper • 2509.14232 • Published Sep 17 • 21

upvoted a paper 3 months ago

SAIL-VL2 Technical Report

Paper • 2509.14033 • Published Sep 17 • 44

liked a model 3 months ago

facebook/nllb-200-distilled-600M

Translation • Updated Feb 14, 2024 • 276k • 805

upvoted a paper 3 months ago

GenExam: A Multidisciplinary Text-to-Image Exam

Paper • 2509.14232 • Published Sep 17 • 21

liked a dataset 3 months ago

OpenGVLab/GenExam

Updated Oct 6 • 229 • 3

published a dataset 3 months ago

OpenGVLab/GenExam

Updated Oct 6 • 229 • 3

liked a dataset 3 months ago

PhoenixZ/RISEBench

Updated May 30 • 99 • 3

upvoted a paper 3 months ago

Does DINOv3 Set a New Medical Vision Standard?

Paper • 2509.06467 • Published Sep 8 • 37

liked 2 models 3 months ago

OpenGVLab/InternVL3_5-241B-A28B-HF

Image-Text-to-Text • 241B • Updated Sep 8 • 88 • 11

OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview

Image-Text-to-Text • 0.4B • Updated Aug 29 • 47.1k • 82

authored a paper 3 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208

upvoted a paper 4 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208

liked a model 4 months ago

OpenGVLab/InternVL3_5-241B-A28B

Image-Text-to-Text • 241B • Updated Aug 29 • 625 • 132

Zhaokai Wang

AI & ML interests

Recent Activity

Organizations

wzk1015's activity