Getting Started with Voice Coding

Updated on 11/11/2018 to discuss newer software versions.

A hands-free coding environment has a lot of moving parts, which can be overwhelming at first. This post will teach you how to set up the basic voice recognition environment. I also use eye tracking, but I’ll cover that in a separate post.

To begin with, install Dragon NaturallySpeaking, the voice recognition engine. Sadly, it’s only available for Windows, so you’ll have to do Linux development using a virtual machine or remote access (see my post for advice). There are two Dragon/Windows combinations I recommend. Most folks will be happiest using Windows 10 with Dragon 14 or 15, simply because it works well enough and Windows 10 is a major upgrade from 8.1. If you decide to go with Dragon 15, have a look at my recent post for installation and setup advice. If you don’t mind using an older version of Windows, I recommend Windows 7 or 8 with Dragon 12 (and of those two, Windows 8 is slightly more buggy with Dragon than 7). Later versions of Dragon (13+) don’t support select-and-say in most apps, which is a significant limitation (this lets you easily make edits on the last utterance you just dictated with built-in commands).

I recommend investing in a good microphone; I recommend either Sennheiser ME3 or TableMike. These are not cheap (about $200), but the microphone matters a lot: Dragon can be frustrating, so you want to do everything you can to minimize that.

Next, install NatLink, an extension to Dragon that makes it possible to add custom Python commands. Follow the instructions here. If everything works, you’ll see a window pop up after starting Dragon titled “Messages from NatLink”. It’s common to run into problems installing NatLink, so read the instructions carefully. For your first installation, I highly recommend using their prepackaged version of Python to avoid trouble.

Finally, install Dragonfly, a cleaner Python interface to NatLink. The original GitHub repository is no longer maintained, so the community has switched to Danesprite’s fork, which you can install via pip install dragonfly2. It’s just a Python library, so if it worked you should now be able to import dragonfly from Python.

To get started with Dragonfly, I recommend looking at some example modules. You can check out the original repository of examples or modules mentioned in the docs. For voice coding purposes, you’ll want to familiarize yourself with the multiedit module. Just drop a module into your NatLink MacroSystem directory, turn your microphone off and on, and NatLink will attempt to load it. If it’s not working, check the messages window to see if there are any error messages.

Of course, this is just the beginning. The interesting part is extending the Dragonfly modules and writing your own to support a full-featured voice coding environment. I’ll cover that in future posts!

30 thoughts on “Getting Started with Voice Coding”

  1. I have to disagree with you about Dragon 13 being the best. Although the browser plug-ins are nice and it’s a little bit more accurate out-of-the-box, the decision Nuance made, to allow Select-and-Say only in specific applications (as opposed to everywhere in previous versions) actually caused me to downgrade.

    1. Wow, I didn’t realize Dragon 12 supported this in every app. I updated my post with this information. I’m considering downgrading too. It’s pretty painful not having this in Chrome, and I find the Chrome extension that adds this causes other problems. Thanks!

    2. I played around with this a bit on an old laptop with Dragon 12. The problem I found, at least in Google Chrome, is that Select-and-Say ceases to work as soon as you use a custom movement or editing command. Contrast this with the dictation box, where you can mix and match custom commands and standard Dragon commands. Do you just try to avoid this mixing, or do you only use Select-and-Say to fix the most recent utterance?

      1. Select-and-say in nonstandard applications only works on the utterances since the last voice command. This is still extremely useful, however. Totally worth skipping DNS 13 for this reason alone.

        1. I bit the bullet this weekend and downgraded to Dragon 12. Suffice it to say I don’t miss a single Dragon 13 feature, and it is wonderful having Select-and-Say working again. I updated the post with a strong recommendation for Dragon 12. Thanks!

  2. There are plenty of other good microphones; which is best for you will depend on things like do you want mono or stereo headphones? Do you want to have no wires? How noisy is your environment? And so on.

  3. “Just drop a module into your NatLink MacroSystem directory, turn your microphone off and on, and NatLink will attempt to load it.”

    NatLink should reload changed modules at the next utterance automatically. Vocola works this way.

    1. Natlink doesnt reload at the next utterance at least not in my experience. Also are you the same Mark who made Vocola?

      1. sigh It used to. Quintijn changed the default for this I believe in hopes of decreasing command latency. Vocola still provides the old behavior. This is probably fixable without too much trouble.
        (And yes, I am the maintainer of Vocola 2; Rick is the original creator.)

        1. I actually like the current behavior. This gives me more control over when I am ready to reload the module, so I can check for bugs. In principle I could do this by avoiding saving, but this makes it easier to use static analysis tools (plus I just like to save regularly).

          I think modules can also be reloaded with a voice command, although I don’t currently use that.

        2. For automatically reloading the grammar on a file save, you can putnatlinkmain.setCheckForGrammarChanges(1) into your load function.

    2. I’ve always had to turn my microphone off and on. I just double checked. Here is the version I’m using:
      NatLink version: 4.1lima
      DNS version: 13
      Python version: 27
      Windows Version: 8

    1. Sorry for the slow reply!

      These programs are Windows-only. I haven’t tried to do voice dictation on Mac OS X, but the most complete package available is probably voicecode.io. There are also some folks working on open source packages, but they are still in the early stages AFAIK.

      Remember that you can use Linux and Mac from within Windows (see my post). If you decide to go with Windows, I think there are two decent configuration options:
      1) Windows 7, Dragon 12.
      2) Windows 10, Dragon 14.

      The advantage of 1 is that you can use Select-and-Say to edit recent utterances in any app. This means improved ease of editing without having to start a separate text editor. The advantage of 2 is simply that you get newer, shinier software (I haven’t used either yet so I can’t really speak to details).

  4. Hi James,

    I’m following your blog to get started with voice coding in Python. I’m running Dragon 14 on Windows 10. I see in your last comment (2015) that you haven’t tried either. Do you have more information now about how well this config works for voice coding? I see you have mentioned that Dragon 14 doesn’t support Select-and-Say in every single application. I can’t find information online about whether this still is the case, and how this impacts voice coding. In 2017, are things different?

    1. Hi Kariina,

      Indeed, nothing has changed since my post. Dragon 14 should be fine except for this one caveat, which is not actually relevant to voice coding per se, but presumably you also want to do some plain English dictation, which would be affected.

      Definitely avoid Dragon 15 for now. It has showstopper bugs that prevent it from being used effectively with Natlink/Dragonfly. This is unfortunate, to say the least, because apparently it is by far the best version of Dragon otherwise. Here is a thread with more details: http://knowbrainer.com/forums/forum/messageview.cfm?catid=25&threadid=22911

      1. Thanks for your reply. Another question: do you know of any online communities for voice coding? I know about KnowBrainer, I’m wondering if there are others. The reason being: I do not know how to program in Python, or any other programming language. I thought that learning to code with my voice would be a good way to spend the free time I have because of my arm injuries. I have downloaded Vocola, because Dragonfly requires Python knowledge. However, I’m still utterly confused about how to create commands to program in a language that I don’t know. Therefore, I’m wondering if it is possible to download someone else’s voice coding configuration and get started with learning Python. Do you know of any Internet community, for example, on Github?

        1. I know of a couple others:

          VoiceCoder is an active Yahoo group that has been around forever:
          https://groups.yahoo.com/neo/groups/VoiceCoder/info

          Dragonfly has its own mailing list: https://groups.google.com/forum/#!forum/dragonflyspeech

          If you are looking for repositories to fork, you’ll find several linked off the dragonfly list. Mine is linked in the navbar (https://github.com/wolfmanstout/dragonfly-commands).

          Python is a great first language to learn, although learning to program for the first time is always an exercise in patience (which I’m sure you are familiar with as someone with arm injuries). I highly recommend starting by just trying to get web browsing commands to work (e.g. based on my repository and my post). Then you can more easily navigate Python documentation and dig deeper.

  5. Hello,

    I appreciated the quick response I received from the Dragonfly programming community. Thank you for the help. I started replacing my Dragon scripts with Dragonfly codes. I’m curious, and a little puzzled on three cases though and I have not found an explanation in the documentation. Might someone point me in the right direction?

    I’m trying to use the Mimic command to mimic the HeardWord function in the Dragon scripting. For example, I replaced the Dragon command, “delete previous three words” with “three zapper”.

    In the scripting language that looks like:
    HeardWord (“delete”,”previous”, ListVar1,”words”)

    Is there a way to use the Mimic command to achieve the same result as the Dragon script? I can achieve something similar using the Windows hotkeys, but it’s not quite as effective. I’ve tried various combinations on the theme of:
    “[] testing”: release + Mimic(extra=”select [] words back”) + Key(del),
    but to no avail.

    I’ve also tried the well-known keyboard combination:
    “three (fingered|finger) salute”: Key(“alt:down, ctrl:down, del:down”),
    “three (fingered|finger) salute”: Key(“alt:down”)+ Key(“ctrl:down”)+ Key(“del:down”),
    … but neither of these two variations on the theme work. Any ideas?

    Lastly, it’s not clear to me how the Repeat command operates, as neither of these commands works.
    “[] (hashtag|hashtags)”: release + Key(“hash, enter”)Repeat(n),
    “remove [] (hashtag|hashtags)”: release + Key(“del, down”)
    Repeat(count=”n”),

    I’m certain there’s a simple explanation for these, but I’m a little puzzled by the subtleties of the syntax and I’d be grateful for any pointers.
    Thank you so much, Matt

    1. For Mimic, it doesn’t look like it has great built-in support for combining static words with dynamic words as in your example. Most folks don’t use Mimic heavily — it’s only really useful if there’s no good keyboard-based alternative. You can do simple examples using this syntax: Mimic("delete", "previous", "three", "words"). If you want to do something more clever you can always create your own custom action, using the source as an example: https://github.com/t4ngo/dragonfly/blob/e35cef2eca226b1fc0570ca9760ca07ca4b3a8a9/dragonfly/actions/action_mimic.py

      For ctrl-alt-delete, have you tried Key("ca-del")? That’d be the canonical syntax for this sort of thing, although I’m not sure why your examples didn’t work.

      Finally, it looks like the formatting on your last example got messed up by Markdown, but I think I know what the issue is. If you want to hardcode a count such as 3, use count=3. If you want to dynamically use a count from your command, use extra="n" where “n” must also be the extra you define in your grammar.

  6. Hello,

    Thank you for this information. I was able to install both dragonfly and natlink successfully after some troubleshooting, but I am now getting an error after placing the multi-edit file into the macro system folder. I am a beginner programmer, and not quite savvy yet on the troubleshooting. Here is the error:

    ” UnimacroDirectory: C:\NatLink\Unimacro
    setting shiftkey to: {shift} (language: enx)
    start of natlinkstartup
    Vocola not active
    Error loading _multiedit from C:\NatLink\NatLink\MacroSystem_multiedit.py
    Traceback (most recent call last):
    File “C:\NatLink\NatLink\MacroSystem\core\natlinkmain.py”, line 317, in loadFile
    imp.load_module(modName,fndFile,fndName,fndDesc)
    File “C:\NatLink\NatLink\MacroSystem_multiedit.py”, line 1
    Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:19:22) [MSC v.1500 32 bit (Intel)] on win32
    ^
    SyntaxError: invalid syntax
    — skip unchanged wrong grammar file: C:\NatLink\NatLink\MacroSystem_multiedit.py
    — skip unchanged wrong grammar file: C:\NatLink\NatLink\MacroSystem_multiedit.py
    natlinkmain started from C:\NatLink\NatLink\MacroSystem\core:
    NatLink version: 4.1tango
    DNS version: 13
    Python version: 27
    Windows Version: 8or10

    — skip unchanged wrong grammar file: C:\NatLink\NatLink\MacroSystem_multiedit.py
    — skip unchanged wrong grammar file: C:\NatLink\NatLink\MacroSystem_multiedit.py”

    I’d appreciate any suggestions, and they keep your time.

    Thanks,

    Tiff

    1. Based on the error, it sounds to me like some sort of a copy/paste issue — like the contents of _multiedit.py have got some garbage at the top. I would recommend deleting it and copying it in again, making sure to copy in just the raw contents of the file.

  7. Hello James, thank you for your response. Also, in my previous response, I meant to say “thank you for your time”. I am still quite new to Dragon. I was able to get it to work, after removing the syntax errors. Furthermore, I was able to create a very simple mapping! I do have a further question. How do I know how to format the action objects correctly? Is there some sort of online resource? For example, if I wanted to map “paren” to Key(“()”), I receive an error in Natlink for this line. I was able to get “some words I speak” to Key(“a, b, c”), without error. I’m very new to programming, and just want to make some simple edits to the “multiedit.py” for now , but am not sure how to add certain keys. Maybe these need to be spelled out? Also, where is the best place for me to ask questions like this, so I’m not taking up the entire comment section! 🙂

    Thanks again,

    Tiffany

    1. Definitely! My setup is very language-independent. Emacs is a good editor for working with HTML 5 and JavaScript (and anything else, for that matter), although it does have a significant learning curve.

  8. Is Natlink for Python only? I’m learning to code in Java, can I still use Natlink to create Java commands or do I need to use something else?

    1. Natlink is a Python-only interface that lets you configure speech commands, but those speech commands can be used for whatever you can dream up, and that definitely includes helping you program in Java in Eclipse or Emacs or your editor of choice, for example. If you are new to coding in general, though, I would recommend considering Python as your starting language, both because it’s well-suited to learning to code and that way you will learn the language you’ll need to know to edit your grammars.

  9. You could check Talon to program by voice. It is being actively developed and even has support for tobii 4c eye tracker allowing you to mouse hands-free.

    https://talonvoice.com/

    Here are some videos showing how it works:

    Talon Voice – Python Demo

    https://youtu.be/ddFI63dgpaI

    Talon Eye Tracking – Zoom Mouse :
    This is a mousing demo for the https://talonvoice.com project that uses eye tracking with a fast noise recognizer.

    https://youtu.be/VMNsU7rrjRI

Leave a Reply

Your email address will not be published. Required fields are marked *

Markdown is supported. Make sure raw < and > are wrapped in code blocks. You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.