Home

vlm-gym inference TUI

Small vision model in JAX for geoguesssing

Links
AuthorSurya Dantuluri
Published

Small vision model in JAX geoguessing locally with a custom TUI that visualizes attention heads as the model predicts tokens. Built to understand what the model actually looks at when making geolocation decisions.

The terminal interface renders attention maps in real-time alongside predictions — useful for debugging RL reward signals and understanding where the model focuses.