Designing Dragonfly grammars

UPDATE 9/8/2018: This is a technical introduction to designing grammars with Dragonfly. If you are looking for recommendations on words and phrases to use as commands, see this post.

As you build on your grammars over time, you start to run into all kinds of problems. Commands get confused with each other, latency increases, and your grammars become giant disorganized blobs. This is particularly challenging with Dragonfly, which gives you the power to repeat commands in a single utterance, but leaves it up to you structure your grammar accordingly. In this post I’ll discuss the techniques I use to wrangle my grammars.

If you are a beginner and don’t mind pausing after each command, you can stop reading now and use the following simple patterns: flat grammars, one grammar per file, one file per application. The hard part is designing a grammar that works with commands that can be repeated within a single utterance. We’ll take the multiedit grammar as a starting point.

Accuracy

As you add commands, one of the first problems you’ll run into is recognition errors, particularly with any command that allows raw dictation. The trouble is that this grammar allows raw dictation to be mixed with any other commands, so if your raw dictation contains any of those command words, it will get recognized as a command instead of dictation.

The simplest way around this is to move infrequently used commands into a separate (repeated) grammar that doesn’t contain any raw dictation. The downside of this approach is that you will have to remember to pause between commands of these two different classes. I take a slightly different approach: I allow a sequence of commands from the larger group to be immediately followed by sequence of commands from the smaller group which includes dictation. Hence there is no mixing between these command groups, but they can be spoken in a single utterance. It does impose some constraints on command ordering, but this isn’t a big problem because frequently you want to pause after several arbitrary dictations to make sure the output is correct.

Of course, this only reduces the severity of the problem. As long as you have some command words mixed with arbitrary dictation, you will occasionally want to use those command words within the dictation. To handle these edge cases, I use an escape word which forces the dictation command to be the last in the sequence, preventing these words from being recognized as a successive command.

Latency

As long as you keep your commands pretty simple, you shouldn’t have to worry too much about performance. It becomes a problem when you add specialized commands that contain repetition, such as a command to quickly speak a sequence of numbers or letters. Every repeated element multiplies its component size by the number of allowed repetitions. If you allow nested repetition, you get O(n^2) growth, which will quickly slow things down. I avoid this entirely, but I don’t force the commands to be in a separate utterance. Instead, between my two top-level repeated elements, I allow zero or one instance of any of my specialized repeated elements.

Here’s what my RepeatRule spec looks like at this point:

[<sequence>] [<nested_repetitions>] [<terminal_sequence>] [terminal <terminal>] [[[and] repeat [that]] <n> times]
  • sequence: My largest collection of repeatable commands
  • nested_repetitions: Commands that contain Repetition elements
  • terminal_sequence: A small set of commands that contains raw dictation
  • terminal: Same as terminal_sequence, except not repeated

Organization

The multiedit grammar doesn’t provide an obvious way to create commands that only run in particular contexts. The trouble is that Dragonfly contexts can only be used in an exported rule or grammar, so it’s not easy to apply them to specific components of a repeated element. It turns out that this isn’t just a Dragonfly limitation, it is inherent to the way Dragonfly grammars map to NatLink grammars, and the limitations imposed on those. But fortunately, we can work around it: we will create complete exported rules for every configuration we want to support, and associate them with contexts that ensure mutual exclusivity. This isn’t exactly cheap, but it is scalable if we assume hierarchical contexts.

Creating complete exported rules requires that we refactor RepeatRule to use a proper constructor instead of setting values statically. We also have to give each rule a different name, to make Dragonfly happy. Here’s what this looks like:

class RepeatRule(CompoundRule):
    def __init__(self, name, repeated, terminal, context):
        spec     = "[<sequence>] [<nested_repetition>] [<terminal_sequence>] [terminal <terminal>] [[[and] repeat [that]] <n> times]"
        extras   = [
            ...
        ]
        defaults = {
            ...
        }
        CompoundRule.__init__(self, name=name, spec=spec,
                              extras=extras, defaults=defaults, exported=True, context=context)

Creating mutually exclusive contexts is the trickier part. I use some helper logic:

def combine_contexts(context1, context2):
    """Combine two contexts using "&", treating None as equivalent to a context that
    matches everything."""
    if not context1:
        return context2
    if not context2:
        return context1
    return context1 & context2

class ContextHelper:
    """Helper to define a context hierarchy in terms of sub-rules but pass it to
    dragonfly as top-level rules."""

    def __init__(self, name, context, element):
        """Associate the provided context with the element to be repeated."""
        self.name = name
        self.context = context
        self.element = element
        self.children = []

    def add_child(self, child):
        """Add child ContextHelper."""
        self.children.append(child)

    def add_rules(self, grammar, parent_context):
        """Walk the ContextHelper tree and add exclusive top-level rules to the
        grammar."""
        full_context = combine_contexts(parent_context, self.context)
        exclusive_context = full_context
        for child in self.children:
            child.add_rules(grammar, full_context)
            exclusive_context = combine_contexts(exclusive_context, ~child.context)
        grammar.add_rule(RepeatRule(self.name + "RepeatRule",
                                    self.element,
                                    terminal_element,
                                    exclusive_context))

I can then build a tree of these and add all the rules with a single method call:

global_context_helper = ContextHelper("Global", None, single_action)

emacs_context_helper = ContextHelper("Emacs", AppContext(title = "Emacs editor"), emacs_element)
global_context_helper.add_child(emacs_context_helper)

... add more helpers to the tree ...

grammar = Grammar("repeat")
global_context_helper.add_rules(grammar, None)

Conclusion

To see the complete code for my repeated grammar, check out _repeat.py on GitHub.

There are lots of other ways you could organize a grammar, and I’m sure I will make more improvements over time. Please add comments if you have ideas!

11 thoughts on “Designing Dragonfly grammars”

  1. I use a set of “literalizer” words (currently a set of one, “English”) that causes the following command to not be considered the start of a command. Pretty easy to get used to.

    1. How do you implement that? Once Dragon picks up the next word as a command, it seems like that would break up the previous Dictation element.

      1. Lol, your question forces me into a funny realization of how our systems differ. In my case, I parse commands myself. All utterances must start as a command of some form, but my “ContinuousRule”s take the final heard utterance, and if they find commands in the designated dictation, the dictation is split, first part updated for the command to receive, and the rest is mimicked. So say I have commands c1 thru c4, and c1 has a spec “c1 ” — then “c1 some stuff c2 some stuff c3” will mimic everything after “c1 somestuff”.

        So long story short, I parse and split the heard words, so with that understanding, it’s pretty easy to notice “English” and disallow interpreting what follows as a command start.

        My repository (linked to here if you click my name) can be a bit complex, but if you look through it and have any questions, feel free to ask.

  2. I haven’t tried this, but my intuition says you should divide things into cases by how frequently command words need to be part of dictation.
    For command words that are frequently part of dictation, choose a different command word, a non-English one if necessary.
    For infrequent cases, don’t worry too much about it and just provide non-chain-able versions of the dictation commands. Yes, this means in these cases you need a pause before and afterwards, but this will be much simpler than trying to worry about escape words. Because these cases are rare, the extra pause time should not matter very much and well worth a substantial savings in implementation complexity of not having to implement escape words.

    1. I notice that Ben of VoiceCode.IO precedes all his templates by “Quinn” so for example creating an if-then-else template is “Quinn if else”. The use of the word “Quinn” avoids issues with identifiers like “if_need_title”.

      Using combinations of words that don’t appear in arbitrary dictation may also work. “key if” for “if ” is much less likely to cause problems while still being memorable.

      1. My escape word (terminal) precedes the command, so in a sense it is equivalent to defining non-chainable commands (e.g. I just think of “terminal score” as the non-chainable version of “score”).

        I do something similar to Ben for templates (I use “plate”). So in one command I can say “plate if bang score found tab” for

        if (!found) {
          <cursor location>
        }
        
      1. I mentioned this before, but I went a different way. I can run pretty much every command together, except for the few I specifically decide for whatever reason will not chain due to specific needs.

        Mind you, I do have a moderate lag issue, but I think my lag is from some really deeply nested rules of many alternatives repeated so many potential times, i.e. an over complex grammar in terms of it’s rule set, and not in the manner in which I’m dispatching the chained rules.

        I’m actually getting ready (meaning maybe years from the look at it :-p) at putting up a python package called “dragonflow” that let’s someone just plug in use my method. It’ll be at my chajadan github.

Leave a Reply

Your email address will not be published. Required fields are marked *

Markdown is supported. Make sure raw < and > are wrapped in code blocks. You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.