From-scratch JAX implementation of Qwen3-VL. Reimplemented mRoPE, Deepstack, KV Cache, and ViTs to get a deep understanding of the architecture. Runs Qwen3-VL 2B locally with full chain-of-thought reasoning visible.
Built as the foundation for vlm-gym and the geo-guessing RL pipeline. The impetus was wanting a lean, HuggingFace-free implementation that could run directly on TPUs.
