Differential vision encoding for computer-use agents. Instead of sending full screenshots each frame, encode only the visual diff between consecutive frames. Reduces token cost and latency for CUA inference.
diffcua
Differential vision encoding for computer-use agents