What should the tokens be?
     
   
gaze direction/target?
   
gaze transitions?
   
image neighborhood codes?
     - avoid explicit computation of gaze!