Voice control systems come and go. Most fail to attract a significant following and eventually fizzle away. Every once in a while, though, one persists and shapes the community forever: Vocola, Utter Command, Dragonfly, and others before my time. In the last couple years, Talon has gained momentum as a new system targeted at power users. About a year ago it added Windows and Linux to the list of supported OSes (after Mac), and that’s when I started paying closer attention. For a while, I watched it improve from the sidelines. As someone who spent years developing my own grammar for Dragonfly, I wasn’t eager to rewrite it all or learn a new community grammar. On top of that, I had concerns that much of Talon was not open source. Despite this hesitance, I saw enough promise in my early experiments that I decided to take the plunge and test it out in earnest. My effort has been undoubtedly worthwhile. I’ve been impressed both by what Talon can do today and the velocity with which it is improving. I can confidently say that I’m more productive than ever, and have never been more excited about the future of voice computing. In this post, I’ll examine Talon’s capabilities and conclude with a detailed discussion of the implications of its partially closed-source model. Although the focus will be on Talon, I’ll occasionally compare it to Dragonfly as a helpful reference point, since it is a system that I (and many of my readers) have used for years.
Having helped many people with Dragonfly’s notoriously fickle installation process over the years, I was relieved by the simplicity of Talon’s installation. It is freely available for Mac, Linux, and Windows, although I have been using the bleeding edge Beta version on Windows, which costs $25/month for early access to new features (and consequently, influence on those new features). Simply run the installer and clone the community grammar into the user directory it creates for you, and you are ready to go. Talon bundles its own Python environment, running a recent version of Python 3 64-bit, in contrast to NatLink’s experimental-only support for Python 3 and restriction to 32-bit. Updates to Talon are even easier: it checks automatically on startup and prompts you to apply the update if you wish.
Talon bundles a lot of useful functionality out of the box. Like Dragonfly, it can integrate with Dragon, but it does so using its own custom integration (Draconity) instead of reliance on NatLink. By enabling communication with Dragon from a separate process, Draconity is generally able to provide a more robust experience and access to 64-bit Python.
The only issue I’ve noticed is occasional dropped commands immediately after a context switch. (Already fixed!) Additionally, Talon offers its own speech recognition engine, Wav2Letter Conformer, which, impressively, is competitive with Dragon. Like Dragon, it can guess the pronunciation of made up words with uncanny accuracy. It responds with very low latency, even lower than Dragon when context switching between different grammars. In my experience, though, accuracy still lags a bit behind Dragon, although it is catching up and getting better with each release. Dragon’s strength is its language model, which enables it to make better guesses when the speech audio is ambiguous. Currently, I’m using Dragon for work, but I use Conformer at home to track its development and ensure everything I do is compatible with both. I’m writing this post with Conformer.
In addition to speech recognition, Talon includes support for noise recognition, custom keyboard shortcuts, and eye tracking. The built-in noise recognition support is currently pretty basic (pop and hiss), but Beta users can try out an experimental integration with parrot.py, which lets you train your own custom noises, integrating seamlessly into Talon’s grammars. Astoundingly, the author of parrot.py used it to reach Diamond League in StarCraft … with videos to prove it! In comparison, the custom keyboard shortcuts are less exciting, but still useful as they let you bind keyboard shortcuts to arbitrary Python code, just as you can with words and noises. The eye tracking support works with Tobii eye trackers, bucking their heavyweight software in favor of a lightweight custom integration (in fact, you need to kill the Tobii services or uninstall their software for this to work). It also extends OS compatibility to include Mac and Linux, provided you have access to Windows for initial setup. It works nicely out of the box as a mouse substitute, using popping sounds to zoom in on the screen and then to click. You can also make use of gaze coordinates in your custom commands.
The de facto community grammar, knausj_talon, is an essential part of Talon’s functionality. It even implements some building-block interfaces that Talon provides, so everyone should start with this. Fortunately, it is very well-built. It loosely follows the Utter Command style that I gushed about in an earlier post, in contrast to the Dragonfly community grammar Caster which mixes in some made up words. Additionally, it includes a voice-searchable GUI-based help system which doubles as a gentle introduction to the Talon script syntax (more on that later). This help system works with both the bundled commands and any other commands you use, whether you write them yourself or they come from another third-party bundle. You’ll be using this help system a lot, because there’s a ton of functionality in knausj_talon. Out of the box, this grammar has its own versions of ~80% of the Dragonfly commands I’ve written over the years, along with plenty more I never got around to. The major gaps I’ve noticed are gaze + OCR, accessibility API, and webdriver integrations. Of these, I use gaze + OCR the most by far, and it wasn’t too difficult to get this working with Talon (I plan to release my work on this once I’ve made it easier to install). Of course, much of the overlapping functionality that comes with knausj_talon did not map to the exact same phrases I had been using. For the most part, I’ve been trying to learn the knausj_talon commands, except for my custom monosyllabic alphabet because it works well and would be a pain to relearn. This was a one-line change to integrate (really!) and it immediately worked everywhere without exception.
Knausj_talon also includes a system for prose dictation formatting. Thanks to Draconity, it is able to effectively override Dragon’s own dictation system, because it can bind commands that match any arbitrary utterance — no need to preface with a command word. This can be a double-edged sword, because Dragon has pretty good prose dictation formatting, and knausj_talon lags behind in some respects (e.g. automated number formatting). That said, this extra power means the ability to rethink rough edges in Dragon’s dictation, and standardize on commands that work across multiple recognition engines (e.g. Conformer). For example, knausj_talon bundles its own custom vocabulary system, which lets you make simple edits to a file to add custom words instead of dealing with Dragon’s clunky UI (and you can bind your own commands to edit this file, if you wish). If you just want to use Dragon’s own dictation and vocabulary system, you still can, just as you would with Dragonfly. Knausj_talon leverages a modal system for switching between commands and dictation, so you’d simply avoid entering Talon’s dictation mode and use Dragon’s own dictation mode (or mixed mode) instead. Personally, I’ve been enjoying iterating on knausj_talon’s dictation mode myself in an effort to build the system I always wanted … stay tuned for pull requests!
Recently I’ve started using a couple more community-contributed grammars that work seamlessly alongside knausj: Talon HUD and Cursorless. Talon HUD adds a more full-featured GUI layer, including an indicator of the current mode and a better display of recently-recognized commands. It’s easy to move this interface literally anywhere onscreen; I put it in the taskbar adjacent to the notification area so it doesn’t overlap any content:
Talon HUD also includes a professional-quality tutorial that walks you through all the available functionality. Finally, it doubles as a library for building additional HUD elements, so I’m excited to see what gets built atop this.
Cursorless is a brilliant system for editing files in Visual Studio Code. It decorates the text with colored “hats”, making it possible to efficiently manipulate nearly any sequence of onscreen code with a single short command. Watch the intro video to see how powerful this is. It’s also under active development with new features regularly introduced. Cursorless is one of those lovely examples that show off how voice is not only a viable alternative to mouse and keyboard, it can actually be faster. It also pairs with a full-featured grammar in knausj for working with the rest of VS Code, making it a popular editor to use in combination with Talon. I’ve been very impressed with VS Code so far, although I wish I could have this functionality in Emacs too (the Org Mode extension in VS Code is no substitute for the real thing).
Just as Talon is easy to use right away, it’s also easy to extend. All user scripts (grammars) are organized into Talon files (.talon) and Python files (.py). Talon files mostly consist of simple mappings from commands to actions, following a grammar syntax that’s very similar to Dragonfly. Talon originally used Python for everything (like Dragonfly), but the author decided to introduce a special file format because it brought several advantages. By focusing on command mappings, it enables a very terse and simple syntax. For example, you can bind “save file” to “ctrl-s” with the intuitive line:
save file: key(ctrl-s). This will automatically support chained recognition with all other commands. If you accidentally introduce a syntax error, it will generally only affect a single command (instead of breaking parsing of an entire Python file). The format is sufficiently self-documenting that it’s used as-is in knausj’s GUI help system. The other major advantage of the Talon file format is that it encourages clean separation of concerns. Complex behaviors should be defined in a Python file and exported as an action. These actions can then be used from any Python file or Talon file without the need to explicitly import the action or worry about circular dependencies. You can reorganize your files into directories however you would like without breaking anything, and if you edit and save a file, it’ll be reloaded everywhere instantly. Additionally, any action (including the built-in ones) can be overridden in any context, such as within a single app or website. These factors make it easy to modify a community grammar to suit your needs, whether you want to change the grammar (just adjust the Talon files) or override functionality to work differently in a particular app (just override the action within that context). The only tradeoff I’ve noticed versus doing everything in Python is that there is some overhead when moving functionality from a Talon file to Python, e.g. to refactor a command into an overridable action, or remove redundancy from a bunch of related commands.
Perhaps the weakest aspect of Talon scripting is its documentation. The official documentation is mostly focused on the Python API, and it gives a bunch of isolated examples without really introducing the concepts and how they fit together. Fortunately, there is decent unofficial documentation that covers most of the functionality you will likely use. There is also a built-in REPL that can be used to reveal more hidden functionality, although this too is not formally documented. I recommend running
actions.list() and pasting the output somewhere that’s easy to access and search. You can also run
sim(“an utterance”) to see how Talon will parse an utterance. Since this is typically context-dependent, however, you are better off saying the knausj command “talon test last” after a command goes awry to print debug information to the log.
The best way to learn about Talon’s undocumented functionality is to engage with the community Slack. You’ll not only learn a lot by asking questions and reading the (sadly truncated) archives, but you’ll also realize that you don’t need to build everything on your own. The community is very interested in building shared functionality, and tends to be responsive to feature ideas. For example, I handed down the idea of modulo line numbers to the Cursorless maintainers (just as Mark Lillibridge did to me many years ago), and they had prototyped this the next day with plans to release it. If you do end up building interesting functionality yourself, I encourage you to discuss it on Slack and send a pull request.
Open vs. closed source
As described above, a lot of Talon’s functionality is present in its community grammars, and those are completely open source. Talon itself, however, consists of a mix of open and closed source code. For example, the awesome Conformer speech recognition engine is being developed in the open, building on Facebook’s wav2letter project. Conversely, the nuts and bolts of the Talon Python APIs are generally closed source. This was, for a while, the main thing that made me hesitate to switch from Dragonfly to Talon. Now, I should note that the typical Dragonfly stack is not fully open source either, with Dragon as a prime example of proprietary software. But the Dragonfly community has long been working towards a fully open source system by replacing Dragon with alternatives like Kaldi. My biggest concerns with Talon’s partially-closed model were practical, however, not ideological. I completely respect the Talon author’s decision not to release all of his source code. He made a daring decision to quit his job and work full time on voice control (something which I fantasized about but never did), and he needs a working business model to sustain this. Nevertheless, I had first-hand experience with both a closed source system that had rotted to the detriment of its users (Dragon) and an open source system that had been resuscitated only because the source code was available (Dragonfly), so this worried me. Fortunately, the Talon author, Ryan Hileman, was kind enough to discuss my concerns in depth and the mitigations he has in place. Hence, I will break down the possible implications of a closed source model, and how these do (and don’t) affect Talon in practice.
My top concern was that Talon seemed like a one-person project, carrying risks if Ryan lost interest or if something unexpected were to happen to him (i.e. a bus factor of one). To be sure, there’s a great community building atop Talon, but Ryan is undoubtedly the genius behind the project, something which became manifestly clear after my conversations with him. Fortunately, Ryan says he has privately shared source access with a few others to protect against the most unexpected outcomes — hopefully that’s never needed. A bigger worry for me is simply that Ryan loses interest at some point. He assured me that if he ever stops development, he would establish a continuation plan with the community, which could mean open sourcing the code. That’s comforting — and backed by his strong track record in open source — although there could be a gray area where development slows down but doesn’t stop. This is a difficult question to discuss in the abstract, especially when today Ryan is so devoted to this space: to my knowledge he is the most productive person who has ever worked on voice control. But it’s something to consider in the long term.
Another concern we discussed was community involvement in fixing bugs and adding features. As productive as Ryan is, he’s still one human being, so he has to aggressively manage his time and prioritize between competing feature requests and bugs. At this point, Ryan has fixed or responded to all the bugs I’ve logged, but if the project were open source, I could have opened the hood and tried to fix them earlier myself. Ryan has pointed out that nearly everything in Talon is user-overridable, but in practice it’s much easier to fix the code for keypresses directly than to reimplement them from scratch (for example). Personally, at this point in my life I have little free time, so I’m happy to hand issues off to someone else, but it’s something that might have felt limiting a few years ago. Notably, most of the bugs I’ve logged had workarounds, and in the rare cases where something nasty cropped up in Beta, Ryan speedily fixed it (and it’s also easy to roll back to earlier Talon versions). Similarly, I once shared a feature idea only to have Ryan implement it in Beta minutes later. Ryan also noted that he has shared the Talon source code with individuals and accepted contributions on a case-by-case basis — it’s just not a decision he takes lightly. Overall, due to Ryan’s extraordinary development speed, this concern feels mostly theoretical right now — Talon is improving at a rate exceeding that of its fully open source predecessors.
Fixing bugs and adding small features is one thing, but what if there’s ever a more fundamental disagreement about Talon’s direction or design philosophy? If the project were fully open source, one would have the option to fork. From our discussion, Ryan sees this as an advantage of Talon’s model: by preventing forks, it prevents fragmentation of the community and simplifies development by ensuring that all user scripts are coded against the same backends (e.g. the same Python version). Additionally, he notes that the parts of Talon which are closed source are generally the parts which are difficult to maintain or least interesting to fork. You might not agree with every design decision in the Talon file format, for example, but you probably wouldn’t want to maintain a fork for that reason. Deep philosophical disagreements are most likely to apply to the community grammar, but that’s all open source. This reasoning does depend on Ryan being an effective gatekeeper with good judgment. Fortunately, that’s something that you can easily assess his track record on, and my assessment would be overwhelmingly positive, as evident from the earlier sections of this post.
Cost is another consideration. Today, Talon is free for the public release and $25/month for the Beta (up from $15/month earlier on, with early subscribers grandfathered in). This seems like a very reasonable pricing model, but it’s something that could possibly change. Again, it helps to consider Ryan’s track record here. He’s been closely engaged with the community from the start and he exudes a hacker ethos, so it seems unlikely that he would make a tone-deaf change to pricing.
It might be nice if Talon were fully open source so its source could be reused in a wider variety of contexts or as educational material. But that shouldn’t impact your decision as a user deciding whether to adopt Talon or another system. Also, some of the most complex and interesting parts of Talon are already open source: the Conformer speech recognition model and Draconity for integrating with Dragon.
In summary, much of the above discussion comes back to one point: trust in Ryan — to keep developing Talon, to be an effective gatekeeper, to keep the cost reasonable. Personally, from my biweekly conversations with him over the past year, and from his impressive track record on Talon, he has earned that trust. It may also be comforting to consider the absolute worst case scenario, to see it’s not the end of the world. Suppose something happens that makes Talon a complete no-go. You’ll still have your grammar source code, and you can always migrate it elsewhere. I was able to convert about 80% of my Dragonfly grammar to Talon in a weekend, and this would likely be even simpler starting from Talon with its neat separation of concerns and easily-parsed Talon files. Even in this worst case, I have faith that the “Talon period” of voice control development will have been a huge net win for the community, having already led to great progress in a variety of areas.
I highly recommend trying out Talon and its community grammars if you haven’t used them recently. You may be surprised at how easy it is to get up and running — even if you’ve built a lot on your own over the years. I hope that the Dragonfly devotees continue their noble work towards a fully open source alternative, but I plan to use Talon on a day-to-day basis for the foreseeable future. It’s unfortunate that the power user community now seems to be split across these two systems. For my part, I plan to develop any major contributions to work with both platforms, starting with porting my gaze_ocr library to Talon, and I encourage others to do the same.
Once you’ve had some time to play around with Talon, please let me know what you think in the comments!