Getting Started with Voice Coding

Updated on 11/22/2017 to discuss newer software versions.

A hands-free coding environment has a lot of moving parts, which can be overwhelming at first. This post will teach you how to set up the basic voice recognition environment. I also use eye tracking, but I’ll cover that in a separate post.

To begin with, install Dragon NaturallySpeaking, the voice recognition engine. Sadly, it’s only available for Windows, so you’ll have to do Linux development using a virtual machine or remote access (see my post for advice). There are two Dragon/Windows combinations I recommend. If you don’t mind using an older version of Windows, I recommend Windows 7 or 8 with Dragon 12 (and of those two, Windows 8 is slightly more buggy with Dragon than 7). Later versions of Dragon only support select-and-say in particular apps, which is a huge limitation (this lets you easily make edits on the last utterance you just dictated with built-in commands). If you want to (or have to) use the latest version of Windows, then use Dragon 14, which is the first version compatible with Windows 10. Do not under any circumstances use Dragon 15, which does not work with the third-party extensions that you will need. Hopefully Nuance fixes this in future versions of Dragon …

I recommend investing in a good microphone; I recommend either Sennheiser ME3 or TableMike. These are not cheap (about $200), but the microphone matters a lot: Dragon can be frustrating, so you want to do everything you can to minimize that.

Next, install NatLink, an extension to Dragon that makes it possible to add custom Python commands. Follow the instructions here. If everything works, you’ll see a window pop up after starting Dragon titled “Messages from NatLink”. It’s common to run into problems installing NatLink, so read the instructions carefully. For your first installation, I highly recommend using their prepackaged version of Python to avoid trouble.

Finally, install Dragonfly, a cleaner Python interface to NatLink. The prepackaged binaries are several years out of date, so I recommend cloning their git repository. Run python build_develop.py to install it. It’s just a Python library, so if it worked you should now be able to import dragonfly from Python.

To get started with Dragonfly, I recommend looking at some example modules. You can check out the original repository of examples or modules mentioned in the docs. For voice coding purposes, you’ll want to familiarize yourself with the multiedit module. Just drop a module into your NatLink MacroSystem directory, turn your microphone off and on, and NatLink will attempt to load it. If it’s not working, check the messages window to see if there are any error messages.

Of course, this is just the beginning. The interesting part is extending the Dragonfly modules and writing your own to support a full-featured voice coding environment. I’ll cover that in future posts!

19 thoughts on “Getting Started with Voice Coding”

  1. I have to disagree with you about Dragon 13 being the best. Although the browser plug-ins are nice and it’s a little bit more accurate out-of-the-box, the decision Nuance made, to allow Select-and-Say only in specific applications (as opposed to everywhere in previous versions) actually caused me to downgrade.

    1. Wow, I didn’t realize Dragon 12 supported this in every app. I updated my post with this information. I’m considering downgrading too. It’s pretty painful not having this in Chrome, and I find the Chrome extension that adds this causes other problems. Thanks!

    2. I played around with this a bit on an old laptop with Dragon 12. The problem I found, at least in Google Chrome, is that Select-and-Say ceases to work as soon as you use a custom movement or editing command. Contrast this with the dictation box, where you can mix and match custom commands and standard Dragon commands. Do you just try to avoid this mixing, or do you only use Select-and-Say to fix the most recent utterance?

      1. Select-and-say in nonstandard applications only works on the utterances since the last voice command. This is still extremely useful, however. Totally worth skipping DNS 13 for this reason alone.

        1. I bit the bullet this weekend and downgraded to Dragon 12. Suffice it to say I don’t miss a single Dragon 13 feature, and it is wonderful having Select-and-Say working again. I updated the post with a strong recommendation for Dragon 12. Thanks!

  2. There are plenty of other good microphones; which is best for you will depend on things like do you want mono or stereo headphones? Do you want to have no wires? How noisy is your environment? And so on.

  3. “Just drop a module into your NatLink MacroSystem directory, turn your microphone off and on, and NatLink will attempt to load it.”

    NatLink should reload changed modules at the next utterance automatically. Vocola works this way.

    1. Natlink doesnt reload at the next utterance at least not in my experience. Also are you the same Mark who made Vocola?

      1. sigh It used to. Quintijn changed the default for this I believe in hopes of decreasing command latency. Vocola still provides the old behavior. This is probably fixable without too much trouble.
        (And yes, I am the maintainer of Vocola 2; Rick is the original creator.)

        1. I actually like the current behavior. This gives me more control over when I am ready to reload the module, so I can check for bugs. In principle I could do this by avoiding saving, but this makes it easier to use static analysis tools (plus I just like to save regularly).

          I think modules can also be reloaded with a voice command, although I don’t currently use that.

        2. For automatically reloading the grammar on a file save, you can putnatlinkmain.setCheckForGrammarChanges(1) into your load function.

    2. I’ve always had to turn my microphone off and on. I just double checked. Here is the version I’m using:
      NatLink version: 4.1lima
      DNS version: 13
      Python version: 27
      Windows Version: 8

    1. Sorry for the slow reply!

      These programs are Windows-only. I haven’t tried to do voice dictation on Mac OS X, but the most complete package available is probably voicecode.io. There are also some folks working on open source packages, but they are still in the early stages AFAIK.

      Remember that you can use Linux and Mac from within Windows (see my post). If you decide to go with Windows, I think there are two decent configuration options:
      1) Windows 7, Dragon 12.
      2) Windows 10, Dragon 14.

      The advantage of 1 is that you can use Select-and-Say to edit recent utterances in any app. This means improved ease of editing without having to start a separate text editor. The advantage of 2 is simply that you get newer, shinier software (I haven’t used either yet so I can’t really speak to details).

  4. Hi James,

    I’m following your blog to get started with voice coding in Python. I’m running Dragon 14 on Windows 10. I see in your last comment (2015) that you haven’t tried either. Do you have more information now about how well this config works for voice coding? I see you have mentioned that Dragon 14 doesn’t support Select-and-Say in every single application. I can’t find information online about whether this still is the case, and how this impacts voice coding. In 2017, are things different?

    1. Hi Kariina,

      Indeed, nothing has changed since my post. Dragon 14 should be fine except for this one caveat, which is not actually relevant to voice coding per se, but presumably you also want to do some plain English dictation, which would be affected.

      Definitely avoid Dragon 15 for now. It has showstopper bugs that prevent it from being used effectively with Natlink/Dragonfly. This is unfortunate, to say the least, because apparently it is by far the best version of Dragon otherwise. Here is a thread with more details: http://knowbrainer.com/forums/forum/messageview.cfm?catid=25&threadid=22911

      1. Thanks for your reply. Another question: do you know of any online communities for voice coding? I know about KnowBrainer, I’m wondering if there are others. The reason being: I do not know how to program in Python, or any other programming language. I thought that learning to code with my voice would be a good way to spend the free time I have because of my arm injuries. I have downloaded Vocola, because Dragonfly requires Python knowledge. However, I’m still utterly confused about how to create commands to program in a language that I don’t know. Therefore, I’m wondering if it is possible to download someone else’s voice coding configuration and get started with learning Python. Do you know of any Internet community, for example, on Github?

        1. I know of a couple others:

          VoiceCoder is an active Yahoo group that has been around forever:
          https://groups.yahoo.com/neo/groups/VoiceCoder/info

          Dragonfly has its own mailing list: https://groups.google.com/forum/#!forum/dragonflyspeech

          If you are looking for repositories to fork, you’ll find several linked off the dragonfly list. Mine is linked in the navbar (https://github.com/wolfmanstout/dragonfly-commands).

          Python is a great first language to learn, although learning to program for the first time is always an exercise in patience (which I’m sure you are familiar with as someone with arm injuries). I highly recommend starting by just trying to get web browsing commands to work (e.g. based on my repository and my post). Then you can more easily navigate Python documentation and dig deeper.

Leave a Reply

Your email address will not be published. Required fields are marked *

Markdown is supported. Make sure raw < and > are wrapped in code blocks. You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>