Updated on 1/9/2021 to discuss newer software versions.
A hands-free coding environment has a lot of moving parts, which can be overwhelming at first. This post will teach you how to set up the basic voice recognition environment. I also use eye tracking, but I’ll cover that in a separate post.
To begin with, install Dragon NaturallySpeaking, the voice recognition engine. Sadly, it’s only available for Windows, so you’ll have to do Linux development using a virtual machine or remote access (see my post for advice). There are two Dragon/Windows combinations I recommend. Most folks will be happiest using Windows 10 with Dragon 14 or 15, simply because it works well enough and Windows 10 is a major upgrade from 8.1. If you decide to go with Dragon 15, have a look at my recent post for installation and setup advice. If you don’t mind using an older version of Windows, I recommend Windows 7 or 8 with Dragon 12 (and of those two, Windows 8 is slightly more buggy with Dragon than 7). Later versions of Dragon (13+) don’t support select-and-say in most apps, which is a significant limitation (this lets you easily make edits on the last utterance you just dictated with built-in commands).
I recommend investing in a good microphone; I recommend either Sennheiser ME3 or TableMike. These are not cheap (about $200), but the microphone matters a lot: Dragon can be frustrating, so you want to do everything you can to minimize that.
Next, install NatLink, an extension to Dragon that makes it possible to add custom Python commands. I use the experimental Python 3 version because some of my functionality requires Python 3 (e.g. the fastest version of my gaze-ocr package). Unfortunately, this version is still under heavy development and is frequently broken or has out of date documentation. You may need to piece together the installation instructions across both the the Natlink site and the readme. If you want a simpler installation process and don’t need all the latest cutting edge features, you can follow the instructions for Python 2.7. You can use their prepackaged version of Python or pip install the dependencies yourself with the latest Python 2.7.x 32-bit (64-bit won’t work). When installing Python, be sure to install for all users and add it to the PATH (you’ll have something like “C:\PythonXX\” and “C:\PythonXX\Scripts\” in your PATH if you’ve done this correctly). If everything works, you’ll see a window pop up after starting Dragon titled “Messages from NatLink”. It’s common to run into problems installing NatLink, so read the instructions carefully and look at the unofficial and official troubleshooting advice.
Finally, install Dragonfly, a cleaner Python interface to NatLink. The original GitHub repository is no longer maintained, so the community has switched to a fork, dictation-toolbox/dragonfly, which you can install via
pip install dragonfly2. It’s just a Python library, so if it worked you should now be able to
import dragonfly from Python.
To get started with Dragonfly, I recommend looking at some example modules. You can check out the original repository of examples or modules mentioned in the docs. For voice coding purposes, you’ll want to familiarize yourself with the multiedit module. Just drop a module into your NatLink MacroSystem directory, turn your microphone off and on, and NatLink will attempt to load it. If it’s not working, check the messages window to see if there are any error messages.
Of course, this is just the beginning. The interesting part is extending the Dragonfly modules and writing your own to support a full-featured voice coding environment. I’ll cover that in future posts!