Utter Command: Why I Rewrote My Entire Grammar

For years, I’ve been approaching speech recognition like a backend engineer: I have a flexible coding style for managing my grammars, I’ve implemented a lot of functionality, and I’ve added some helpful integrations. But embarrassingly, until recently, I hadn’t put much thought into the User Experience. This all changed after I received an email from Kim Patch, the author of Utter Command, a set of extensions to Dragon that has been around for decades.

Kim offered to talk over the phone, and we ended up chatting for over an hour. Kim was full of insightful observations how to design grammars. I quickly realized that my own grammars were a mess: a mishmash of scraps I picked up from Tavis Rudd’s video, commands I found in GitHub repositories, and plenty of stuff I had made up on my own. This mess was costing me in multiple ways. Adding a command to my grammar was always an ordeal, because I had to come up with some unique identifier (frequently a non-English word), check to make sure it didn’t conflict with anything, and then do my best to remember that word or phrase. In practice, this often meant recognition errors, or pauses as I tried to remember what I had used for my commands. All this discouraged me from adding more commands.

Kim’s work showed that there is a Better Way. I’m happy to say that she’s done a phenomenal job writing her ideas down, too, so you don’t have to all go find Kim’s phone number. I encourage you to explore her website, but make sure you don’t miss Human-Machine Grammar – The Rules. These 16 rules contain her biggest ideas — although it’s also very helpful to see exactly how she instantiates those ideas in Utter Command. In her own words:

The 16 Human-Machine Grammar rules are aimed at keeping the speech interface vocabulary small and easy to remember and predict. These guidelines cut out alternate wordings and establish consistent patterns across the entire set of commands, making it much easier to remember or guess how a command should be worded.

This is well worth reading yourself, but I’ll provide my own notes and compare and contrast this with other grammar styles.

The principles behind Utter Command

One of the striking aspects of Kim’s grammar is that it manages to be concise while still using English words. She takes inspiration from human-human communication systems, such as what you might hear at McDonald’s. You won’t hear “place an order for two cartons of fries”, nor will you hear something that sounds alien. Instead you are likely to hear “two fry”. It’s concise, easy to remember, and easy to extend (guess how you would request three hamburgers). Kim pays close attention to cognitive load in the design of her grammar: how much thought you have to put into speaking commands. If you are using speech recognition to write code, you already have plenty to think about in the code that you are writing. If you also have to think about remembering an alien grammar on top of that, it’s going to detract from your focus on the code. Kim’s grammar minimizes that distraction. There are a couple principles I’ve found particularly useful:

  1. Keep the total number of words in your grammar as small as possible. Look for where you are using two or more slightly different words to mean basically the same thing, and pick one. To avoid reinventing the wheel, you can use the Utter Command dictionary.
  2. Pick a consistent word ordering. Many commands have you manipulating some object on the screen: closing a window, for example. In standard English, you would use the imperative “Close Window”, which has the order verb-object. Kim’s grammar standardizes on the order object-verb instead (“Window Close”), because it requires slightly less cognitive load once you get used to it (by the time you think of “close”, you’ve surely already had to think of the object you want to close, so you might as well say that first). This also means that your commands align better with the natural sequence in time, which is great for speaking commands that do multiple things in sequence. As an added bonus, because this word order doesn’t exactly match the standard English imperative style, it also means that the command is less likely to conflict with your prose dictation (otherwise a concern when using an English-based grammar).

An alternative: ShortTalk

Kim’s grammar is not the only carefully thought-through grammar for speech recognition out there. The other major one that I’m aware of is ShortTalk by Nils Klarlund. This is the style that popularized alien-like words such as “spooce” and “nairx”, and certainly influenced my own early grammar design, as channeled through Tavis Rudd’s famous video. I recommend reading about the grammar to form your own conclusions, but here are the reasons I ultimately decided to switch to Utter Command-style:

  • Your mileage may vary, but I found it took a long time to remember these commands, except for the ones that I used on a daily basis. My hesitation before speaking commands eliminated any speed advantage from having shorter commands.
  • While ShortTalk is a fairly extensive grammar, it doesn’t handle all my needs, which left me needing to design my own commands for additional functionality — often requiring trial and error along with memorization.
  • Recently I’ve been experimenting with integrating Dragon alternatives with Dragonfly. The two options I’m most interested in are Google Speech API, because it’s production-ready and platform-agnostic, and Mozilla DeepSpeech, because it’s open-source. Both of these are unlike Dragon in that they don’t operate on a known grammar, so they aren’t going to recognize any of the ShortTalk words by default. The Google Speech API offers a very limited way to provide your own vocabulary in the form of a list of up to 500 unranked words and phrases, without any way to actually train these words. Hence, it is much easier to use these with a grammar that is based on English words.

Installing Utter Command

The quickest way to get going with Utter Command is currently to buy it from Kim’s website. Kim has open-sourced some of the code, but it depends on DLLs that are not yet available.

If you choose to go this route, be aware that you need Dragon Professional (Dragon 15 Professional Individual will do fine). Interestingly, Utter Command is not implemented with NatLink, but instead uses a native language for specifying commands known as DVC. DVC commands can be simply added to files within C:\ProgramData\Nuance\NaturallySpeaking15\Data\enx\dvcu\general and automatically picked up (after a small tweak is made in C:\ProgramData\Nuance\NaturallySpeaking15\nssystem.ini to change enx Base DVC Directory=dvce to enx Base DVC Directory=dvcu).

I will note that some of the integration points with Windows and browsers have broken, so expect a bit of roughness around the edges. Kim is looking for help posting and improving the rest of her code, updating documentation, and making Utter Command open source. She is also open to porting the system to use something other than DVC, such as Dragonfly. If this is something you are interested in, please post in the comments and I can connect you.

Implementing Utter Command style in Dragonfly

The upside of DVC is that it is very fast, but it’s also very complicated and verbose, so I’m not about to switch my Dragonfly grammar all over to DVC. Additionally, while there is some support for successive commands without pausing in Utter Command, it is all done in a one-off basis, typically for editing commands. I like to design my Dragonfly grammar so that I can make a whole subset of commands chainable. Since discovering Utter Command, I’ve been redesigning my grammar to try to get the best of both worlds: use the principles behind Utter Command where possible, but allow commands to flow from one to the other without pauses. Relearning my entire grammar hasn’t exactly been easy, but due to the simplicity of Utter Command style, it also hasn’t been nearly as painful as I would have expected. I’m already seeing dividends, where structuring my grammar more clearly has indicated gaps that I’ve filled, and I can already sense the decrease in cognitive load even though I’m still internalizing the commands. Here are some pieces of advice if you choose to go this route:

  • You don’t need a perfectly unambiguous grammar; you just need it to be unambiguous for the cases you actually use, which is a much weaker requirement. This is to say: don’t obsess over every possible way your grammar could theoretically go wrong or lead to an ambiguous recognition, other than following the advice in the next bullet.
  • Avoid one-word commands except for the most heavily-use cases. The problem is that once you’ve used a word in a one-word command, it’s very easy to create ambiguous commands if you ever reuse that word again. For example, I initially had “tab” simulate a press of the tab key, and “left” press the left key. When I was adding commands for my browser, I had “tab” as a prefix for several commands, and I kept bumping into misrecognition errors, e.g. I had a “tab left” command that was triggering “tab” and “left” instead. I could have changed that one command, but I ran into enough trouble with similar cases that I decided to change “tab” to “tab-key” (but I kept “left” as is).
  • Utter Command puts the number of repetitions for command first, e.g. “2 Down” instead of “Down 2”. I think this makes a lot of sense from a cognitive load standpoint, because counting requires some thought, so you want to speak the number as early as possible after that thought to release your mind to think about the rest of the command. The downside is that setting up a bunch of Dragonfly commands with numeric prefixes can lead to a massive slowdown if you do it wrong. Dragonfly’s IntegerRef expands into a fairly complicated mini-grammar and Dragon (apparently) resolves commands in a left-to-right fashion, leading to slowness when the two are combined. To solve this, I had to restructure my grammar so that I declared “<n>” one time instead of per-each-command — and then had that followed by an Alternative of all the commands to which it might apply. In general, it helps with grammar performance if you de-duplicate as much as possible, and this is a good example of that.

Utter Command also updates the natural text dictation commands to follow its style. You can easily do this yourself using the same method as Utter Command without pulling in any dependencies. Simply add the following to C:\ProgramData\Nuance\NaturallySpeaking15\Users\<your_profile>\current\options.ini under [Options]:

enx Correct XYZ Command=Nope %1
enx Correct That Command=Nope
enx Select XYZ Command=Words %1
enx Copy XYZ Command=Words Copy %1
enx Cut XYZ Command=Words Cut %1
enx Delete XYZ Command=Words Delete %1
enx Bold XYZ Command=Words Bold %1
enx Italicize XYZ Command=Words Italic %1
enx Underline XYZ Command=Words Underline %1
enx Select XYZ Through XYZ Command=Words %1 Through %1
enx Cut XYZ Through XYZ Command=Words Cut %1 Through %1
enx Copy XYZ Through XYZ Command=Words Copy %1 Through %1
enx Delete XYZ Through XYZ Command=Words Delete %1 Through %1
enx Bold XYZ Through XYZ Command=Words Bold %1 Through %1
enx Italicize XYZ Through XYZ Command=Words italic %1 Through %1
enx Underline XYZ Through XYZ Command=Words Underline %1 Through %1
enx Insert Before XYZ Command=Go Before %1
enx Insert After XYZ Command=Go After %1
enx Cap That Command=Add Caps
enx No Caps That Command=Add No-Caps
enx All Caps That Command=Add All-Caps

Final thoughts

Redoing your grammar may seem like a daunting prospect, but if I did it after years of internalizing my old grammar, you can too! Please tell us about your experiences in the comments.

Finally, I want to give a huge thanks to Kim Patch for her incredible contributions in this area. Utter Command is a brilliant work and deserves more attention than it gets. It is very well-documented and I highly recommend that you explore her website and list of Utter Command resources to learn more.

11 thoughts on “Utter Command: Why I Rewrote My Entire Grammar”

  1. Very interesting, I had heard about utter but hadn’t examined it as thoroughly as shorttalk since it did not seem as available. I would definitely like to take a closer look and potentially update the Rosetta Stone spreadsheet document with commands where they make sense. When I talked with Nils he agreed that the original incarnation of short talk had too many duplicate ways to do things. We talked about potentially simplifying it but that project never really got off the ground due to time constraints.

  2. Nils and I had some good conversations back in the day – I was experimenting with just a few made up words in the mix, but ended up removing most of them because it takes a long time to internalize a new word so you can instinctively picture what it does.

    I think made up words that at least have roots in real words are more useful, although I ended up using only a very few of these. Here’s an example that ultimately didn’t make it into my active vocabulary: “Drax” and ‘Dray”, meaning “Drag x” and “Drag y”, used with an axis number, for example “Drax 50”.

    I think made up forms are only useful when you use them frequently, and so can really internalize them over time. It also helps if they’re part of a pattern. The Utter Command vocabulary words “Befores” and “Afters”, for selecting words before or after the cursor, fall into both categories. “1-20 Before/After” move the cursor, while the plural form selects. Saying “3 Befores” is odd at first, but because it’s part of a pattern and used often you quickly get used to it and can picture what it means. “Afters” is used as a word – it’s a casual reference to dessert in the UK.

    Nils also uses made up words so he can use them in-line – mixed with words without pausing. Dragon has a few in-line commands (”new line”), and they do sometimes cause problems, mostly because Dragon has made the mistake of unnecessarily enabling synonyms (like “next line”). I think it’s fairly natural to put pauses between text and commands most of the time, so I don’t think it’s necessary to have a lot of in-line commands. For the few that are needed, it might be worth the cognitive load of learning a well-constructed made up word.

    It’s important to make the distinction between in-line commands and command phrases. Command phrases allow you to say several commands in a row and greatly improve the speech recognition experience.

    1. Hey Kim, really interesting stuff, I spent some time yesterday reading your rules of grammar and watching some of the video demos. In my experience there is obviously a difference between doing a demo, potentially off a script, and real-life dictation/commands. Usually I only get to maybe 2 or 3 continuous commands before I have to think, I wonder is that your experience as well?

      With regards to short talk, and other languages, here is a link to the Rosetta document I was referring to.

      https://docs.google.com/spreadsheets/d/1pk2gwTFbMebgYSsrxIFsZ-QvpEPWCybF8ypdeBvfBsg/pubhtml

      I find that the conceptual parts of the short talk paper were very interesting. In particular the concept that “ai” moves forward while “oo” moves backward, and that you can control the cursor position as well through pre-and post fix. ie baif moves forward and places the cursor before, aift moves forward and places the cursor after.

      Also when it comes to coding which is where I needed the most help for editing, I found making use of the tools to be quite helpful. Here’s a link to a getting started document with a focus on code.

      https://docs.google.com/document/d/e/2PACX-1vSGRicRTJ2iv7rzLnwYxGnUb39usUk_5o2KPxJ5YE91qv-W_lWHD1C7S4syAHM61VAheR5lQ6hoE55W/pub

      I would certainly be interested in your thoughts.

  3. Awesome stuff. I am glad I got to meet Kim at a couple of conferences a number of years ago (maybe 10-12 years?). She knows what she’s doing. My job demands I know a little about speech input, but for anyone I work with, I would refer them to her site to really make the tool sing.

  4. Hi James,

    Very interesting post here. Thank you!

    I’m a bit confused by your point about “using <n> one time instead of per-each-command and then using alternates.” I’m not sure what these alternates would look like. Can you share a brief code snippet?

    1. Sure, this is a bit tricky. The way I did this was define a separate MappingRule that contains all the commands that are repeatable (behind the scenes, a MappingRule uses a big Alternative element). Then, when weaving this into my top-level rule, I used that rule as an “extra” in a single Compound element which looks like [<n>] <repeatable_command> and has a value_func which repeats the command n times. Here’s my code:
      https://github.com/wolfmanstout/dragonfly-commands/blob/a9cf9d4a5196a6d8112859457dba422af022f4c4/_repeat.py#L872

      Let me know if this is still too confusing and I can try to develop a simpler example. My grammar has a lot going on.

  5. James, where in your git repository do most of these commands reside? is there a grammar specification for Utter basics?

  6. James,
    “Dragon (apparently) resolves commands in a left-to-right fashion, leading to slowness when the two are combined. To solve this, I had to restructure my grammar so that I declared “” one time instead of per-each-command — and then had that followed by an Alternative of all the commands to which it might apply. In general, it helps with grammar performance if you de-duplicate as much as possible, and this is a good example of that.”

    Do you know for certain that Dragon resolves in a left right fashion? I have many list type commands which are usually prefixed with a or a few identifier. The 2 egregious examples being URLs (of which I’m sure I have hundreds) and folders which are probably at least 100 or more. Currently I have commands for the URLs like (edgy | nova | fire) to open them in edge, chrome, or Firefox respectively. I’m wondering if given your experience changing the ordering to (edgy | nova | fire) would cause a significant slowdown?

    I also have a fair number of context-sensitive commands and am wondering if having those in the grammar will create issues beyond that context?

    1. I’m just guessing at Dragon’s behavior based on the performance I see. In this case, I believe I was able to show a massive difference in performance just based on where I placed <n> within my commands. That said, I was able to completely resolve the performance issue by making sure that I don’t repeat <n> in each command that supports it, but rather use it once and put everything after it in a single Alternative element. Conceptually, something like <n> (command1|command2|command3|...). So there’s certainly nothing wrong in principle with what you’re describing, but you may want to ensure it’s structured in a way that is performance-friendly.

Leave a Reply

Your email address will not be published. Required fields are marked *

Markdown is supported. Make sure raw < and > are wrapped in code blocks. You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.