vlm-gym

To solve AGI, we must first solve Geoguessr. vlm-gym is a simple RL gym written from scratch in JAX for training vision-language models on geolocation. Started with Qwen3VL-4B and added Geospot as an RL environment.

VLMs can learn geolocation through progressive geodesic tightening — in a few hundred steps, the model learns to narrow predictions from continent-level to city-level. The goal was a lean, HuggingFace-free gym that's simple to drop in new environments and train on TPU or GPU.