Open source speech recognition: Mozilla DeepSpeech + Common Voice

I learned about a couple very exciting new developments this week in open source speech recognition, both coming from Mozilla. The first is that a year and a half ago, Mozilla quietly started working on an open source, TensorFlow-based DeepSpeech implementation. DeepSpeech is a state-of-the-art deep-learning-based speech recognition system designed by Baidu and described in detail in their research paper. Currently, Mozilla’s implementation requires that users train their own speech models, which is a resource-intensive process that requires expensive closed-source speech data to get a good model. But that brings me to Mozilla’s more recent announcement: Project Common Voice. Their goal is to crowd-source collection of 10,000 hours of speech data and open source it all. Once this is done, DeepSpeech can be used to train a high-quality open source recognition engine which can easily be distributed and used by anyone!

This is a Big Deal for hands-free coding. For years I have increasingly felt that the bottleneck in my hands-free system is that I can’t do anything beneath the limited API that Dragon offers. I can’t hook into the pure dictation and editing system, I can’t improve the built-in UIs for text editing or training words/phrases, I’m limited to getting results from complete utterances after a pause, and I can’t improve Dragon’s OS-level integration or port it to Linux. If an open source speech recognition engine becomes available that can compete with Dragon in latency and quality, all of this becomes possible.

To accelerate progress towards this new world of end-to-end open source hands-free coding, I encourage everyone to contribute their voice to Project Common Voice, and share Mozilla’s blog post through social media.

10 thoughts on “Open source speech recognition: Mozilla DeepSpeech + Common Voice”

  1. Thanks for sharing this. I tried out the interface and spoke some sentences. I did find a little bit clunky. It seems to me that it would be a lot more productive if I could read several sentences at a time instead of just one at a time. I looked around to see if there are some were to leave feedback but did not find it. Do you have the contact information for the people running this?

        1. Thank you Mike, I would be honored to help with this effort!

          My command grammar uses a bunch of made-up words, and even when I use English, the full utterances are virtually never in correct grammatical sentences. Is this acceptable? I could filter out commands with nonsense words so you would get commands like “line 71 west snap three”. Or I could pull some full sentences from my English text dictation history. Just let me know what you are looking for.

          1. Oh, good question!

            Right now, we are definitely interested in your full fentences from your text dictation history.

            But as for the more domain specific commands, can you submit a separate list for that, and I can talk it over with my team?

            1. Mike, I decided to just CC0 my entire blog, as now documented here:

              I created a Python extraction script to pull all sentences from the WordPress export. Anyone is free to reuse this to join in the party:

              Here are the extracted sentences, take whatever you need:

              I figure this is probably best because it is a bunch of English sentences, yet it contains much of the vocabulary used in hands-free computer control.

  2. This is awesome James! Thanks for the extraction script and your public domain blog!

    I read through each sentence from your pastebin splitting up some of the longer sentences, removing some that had repeated words, or too hard to read stuff, but I tried to keep your commands. Have a look, and feel free to submit a PR if there are things you want to add or change:

    Let’s keep in touch 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

Markdown is supported. Make sure raw < and > are wrapped in code blocks. You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.