What should the tokens be?
gaze direction/target?
gaze transitions?
image neighborhood codes?
- avoid explicit computation of gaze!