Re-built Geospot into an online RL policy — a geolocation game where a tiny 330k parameter model learns in real-time to predict where a photo was taken without GPS data. The model continuously improves as it plays.
This was the precursor to the full vlm-gym pipeline, exploring whether a minimal policy network could learn geolocation through pure reinforcement learning.