Update (12/18/2020): I now use a different method to integrate the tracker. See Efficient UI interaction with OCR and Gaze Tracking for the most up-to-date instructions.
You can do a lot just using your voice, but there are still a few times you’ll find yourself reaching for a mouse. It’s often for the silliest little things, like clicking in the empty space within a webpage to change the keyboard context. If you’re serious about not using your hands, you can use an eye tracker to eliminate these last few cases. This post will teach you how to get started. Make sure you’ve read my introductory post on voice coding, since will be building upon that.
Eye trackers used to cost several thousand dollars, but now you can grab a cheap one for less than a couple hundred bucks. I use the Tobii Eye Tracker 4C, which retails right now for $169. I have a full review of it posted here.
The basic idea behind eye tracker interaction is that you look somewhere on the screen and then use some other method for clicking or “activating” the item you’re looking at. It’s generally too distracting to have the pointer follow wherever you’re looking, so usually a keypress instead of a click is used. For our purposes, of course, we’ll want to use a voice command.
The tricky part is integrating it with Dragonfly. It really ought to be easy, except that right now there’s an outstanding bug where their software does not listen for virtual keypresses. There’s a thread in their forums complaining about this, but it sounds like it won’t be fixed until the consumer version, which doesn’t have a release date yet. The workaround is surprisingly elaborate, but the good news is I’ve already done the heavy lifting. The basic idea is that we will call into their C API from Python. The raw API is extremely complicated for our needs, so I wrote a simple wrapper DLL with a few basic functions to connect to the eye tracker, get position data, and activate the current gaze point. You can get the source code and binary distribution of the wrapper from my github repository.
Python makes it a breeze to call into a DLL. Load the DLL with the following lines:
from ctypes import * eyex_dll = CDLL(DLL_DIRECTORY + "/Tobii.EyeX.Client.dll") tracker_dll = CDLL(DLL_DIRECTORY + "/Tracker.dll")
Then you can define some simple wrapper functions that call the DLL functions:
def connect(): result = tracker_dll.connect() print "connect: %d" % result def disconnect(): result = tracker_dll.disconnect() print "disconnect: %d" % result def get_position(): x = c_double() y = c_double() tracker_dll.last_position(byref(x), byref(y)) return (x.value, y.value) def print_position(): print "(%f, %f)" % get_position() def move_to_position(): position = get_position() Mouse("[%d, %d]" % (max(0, int(position)), max(0, int(position)))).execute() def activate_position(): tracker_dll.activate()
With these in place, it’s easy to bind them to voice commands using the handy dragonfly
Function action. It’s useful to have separate commands for moving the pointer and clicking, because the eye tracker accuracy isn’t always perfect.
Foot pedals are another alternative to voice commands. I often use a voice command to move the mouse based on my gaze point, then use my foot to click. I recommend the Omnipedal Quad. These are also great for scrolling, which is pretty awkward with dictation.
There’s a lot more you can do with a tighter integration with the eye tracking API. The major shortcoming of my simple approach is that it doesn’t work well with small click targets. The full API lets the application describe all the click targets, so the closest one will be automatically picked. Of course, this usually requires access to the application source code (or at least an extension), so it’s less generic and harder to get up and running. Please post in the comments if you come up with something!