I learned about a couple very exciting new developments this week in open source speech recognition, both coming from Mozilla. The first is that a year and a half ago, Mozilla quietly started working on an open source, TensorFlow-based DeepSpeech implementation. DeepSpeech is a state-of-the-art deep-learning-based speech recognition system designed by Baidu and described in detail in their research paper. Currently, Mozilla’s implementation requires that users train their own speech models, which is a resource-intensive process that requires expensive closed-source speech data to get a good model. But that brings me to Mozilla’s more recent announcement: Project Common Voice. Their goal is to crowd-source collection of 10,000 hours of speech data and open source it all. Once this is done, DeepSpeech can be used to train a high-quality open source recognition engine which can easily be distributed and used by anyone!
This is a Big Deal for hands-free coding. For years I have increasingly felt that the bottleneck in my hands-free system is that I can’t do anything beneath the limited API that Dragon offers. I can’t hook into the pure dictation and editing system, I can’t improve the built-in UIs for text editing or training words/phrases, I’m limited to getting results from complete utterances after a pause, and I can’t improve Dragon’s OS-level integration or port it to Linux. If an open source speech recognition engine becomes available that can compete with Dragon in latency and quality, all of this becomes possible.
To accelerate progress towards this new world of end-to-end open source hands-free coding, I encourage everyone to contribute their voice to Project Common Voice, and share Mozilla’s blog post through social media.
Tobii has released a new consumer eye tracker, the Tobii Eye Tracker 4C for $150. Although I haven’t found eye tracking to be nearly as helpful as speech recognition, it is handy for those occasional situations where you just want to click a button or change context and you don’t have any command to do so (see my earlier post for details). I have been pretty happy with the Tobii EyeX, but it isn’t perfect, so I was excited to try out this new device. Continue reading Tobii Eye Tracker 4C Review
Not related to coding, but hands-free coders need to have some fun too. 🙂
I discovered recently that Hearthstone can be easily played with eye/head tracking and minimal voice controls (move pointer and click), thanks to the turn-based interface, large click targets, and a high thinking-to-clicking ratio. I don’t even use a custom grammar and it works very well. If you have a good experience, you can thank Blizzard on this thread I started. Hopefully I didn’t just set the voice coding community back by a few months!
If you know of other games that play well with hands-free control, please post in the comments.
Like many an Emacs user, I am enamored with Org-Mode. Every great coding session begins with organizing your thoughts, and Org-Mode is an excellent tool for the job. If you’re tracking New Year’s resolutions, it’s great for that too. Since Org-Mode already has an excellent compact guide, I’ll focus on my voice bindings and finish with a bonus section on how I like to structure my personal to do lists. Continue reading Getting organized with Org mode
When I find myself writing or editing something sufficiently long, I like to have full support for Select-and-Say. I used to use “open dictation box”, since that’s the obvious choice, until I discovered that using Notepad is much faster. Continue reading Avoid the dictation box
I recently came across PCByVoice SpeechStart+, a small but interesting extension to Dragon that adds some nice functionality that would be hard or impossible to implement with Dragonfly. It costs $40 (after a 15-day trial), so I will help you decide whether it is worth the money. Continue reading SpeechStart+ Review
Dragonfly is so powerful that it’s easy to forget that Dragon does some things well out-of-the-box. To maximize your efficiency, it’s important know when it’s not worth it to reinvent the wheel. In this post, I’ll describe when I prefer to use built-in Dragon functionality. Continue reading When to use built-in Dragon functionality
As you build on your grammars over time, you start to run into all kinds of problems. Commands get confused with each other, latency increases, and your grammars become giant disorganized blobs. This is particularly challenging with Dragonfly, which gives you the power to repeat commands in a single utterance, but leaves it up to you structure your grammar accordingly. In this post I’ll discuss the techniques I use to wrangle my grammars. Continue reading Designing Dragonfly grammars
For a site titled Hands-Free Coding, I haven’t written much about How To Actually Write The Code. It turns out this is easier than you might expect. Before reading this post, please familiarize yourself with my getting started guide and how to move around a file quickly.
There are two basic approaches to dictating code: using custom grammars such as Dragonfly, or using VoiceCode, (not to be confused with VoiceCode.io for Mac, which I just discovered and haven’t used yet). VoiceCode is much more powerful out-of-the-box, but is also harder to extend and more restrictive in terms of programming language and environment. You might say that VoiceCode is Eclipse, and Dragonfly is Emacs. You could also consider Vocola for your custom grammars; it is more concise but not quite as flexible because you can’t execute arbitrary Python. Since I prefer Dragonfly, I’ll cover that approach. Continue reading Dictating Code
Once you’ve gotten used to the basic commands and extensions to control Google Chrome, you may start to hunger for a faster way to control websites you use frequently. Some sites have keyboard shortcuts you can bind easily, but others don’t. This post will describe how to set up commands to control any webpage. Continue reading Custom web commands with WebDriver