Rebuilt Geospot as an online RL geolocation game. A 330k-parameter policy predicts where a photo was taken without GPS and updates from play.
This was the precursor to the full vlm-gym pipeline: testing whether a tiny policy could learn useful geolocation behavior from reward alone.