Qwen3-VL-JAX

Qwen3-VL implementation in JAX

AuthorSurya Dantuluri
Published
Views19 from San Francisco, Atlanta, Edmond

From-scratch JAX implementation of Qwen3-VL. I reimplemented mRoPE, Deepstack, KV cache, and the vision towers instead of wrapping Hugging Face.

Built as the foundation for vlm-gym and the geoguessing RL pipeline. The point was a lean codebase I could run on TPUs and modify without fighting a giant library stack.