To be an efficient hands-free coder, you’ll need to learn how to move the cursor around a file quickly. There are two challenges: first, since you can’t use a mouse, you can’t just click to the location to move to. You can try using an eye tracker to accomplish this, but the precision isn’t quite high enough. Second, with a keyboard you can hold a movement key and release when you reach your location, but this doesn’t translate well to voice control, which has too much latency (although you might try measuring the latency and adjusting for it).
Beginner’s note: to get started, check out the multiedit grammar for dragonfly. This gives you commands to start with and a nice framework for repeating commands quickly.
One approach to movement is to search within the file, for example using Emacs incremental search. This works well if you need to jump somewhere offscreen. The trouble with using this for all onscreen movement is that recognition accuracy isn’t always perfect, and that identifier might be repeated several times. This would work better if there were an Emacs extension that numbered incremental search results so it’s easy to jump to a particular one. Let me know in the comments if you make this!
Let’s break the problem down. Every location onscreen is at a particular line and column, so if we can navigate to each of these quickly, we can jump to anywhere quickly.
Let’s start with jumping to a particular line. Any decent editor can show you line numbers and let you jump to a specific line. But try editing a file with several hundred lines, and you’ll find that this is pretty clunky. There are a couple ways to improve on this: you can show line numbers relative to your current position, or you can show the numbers modulo some value. For example, if you never have more than 100 lines on screen at once, you could just show the last two digits of the absolute line number. Personally, I prefer to use relative line numbers since these work well with relative motion commands such as “up ten” or “down five”. Also, the number of syllables scales nicely with the amount of movement. The main advantage to using modulo is that you can chain together successive commands easily (e.g. “select between line X and line Y”). If you use Emacs and go with the relative line numbers approach, check out the package linum-relative.
Next, we have to jump to a particular column. This is a bit trickier because it is awkward to number every column in an editor. I suppose you could write the numbers vertically; please post in the comments if you come up with something to do this. In the meantime, I use a few different approaches, but the key advantage we can exploit is that you generally want to jump to the boundaries of symbols and words. This means that relative motion commands that move by a symbol or word often work well unless the line is long. In Emacs, familiarize yourself with subword mode. I bind commands for moving across an entire symbol and across a single subword. When the line is long, I use a different strategy where I name the character at the beginning or end of a symbol I want to jump to, and use a custom command to jump there. Searching for a character instead of a full word greatly improves recognition accuracy, especially when dealing with unusual words. See the Emacs lisp for this at the bottom of the post. Note that the Emacs extension “Ace jump mode” works similarly to this, but I prefer my approach because I don’t have to wait for an overlay to appear before issuing my command. Voice dictation latency is high enough that it’s almost always an advantage to accomplish everything in a single command.
I also use a few more commands for quick movement to frequently visited places. I use the directions North, South, West, and East to move to the top/bottom of a file and left/right within a line. I use the mark ring in Emacs to jump to previous locations, and registers to save locations and jump to them quickly later. And of course I use page up and down to scroll through file, although I prefer to use foot pedals for this.
Finally, as promised, my code for jumping within a line:
(defun goto-char-if-not-nil (position)
(if position (goto-char position)))
(defmacro goto-char-or-cancel (&rest body)
`(goto-char-if-not-nil (save-excursion ,@body)))
(defun word-start-regex (char)
(if (string-match "\\s_\\|\\sw" (char-to-string char))
(concat "\\_<" (regexp-quote (char-to-string char)))
(regexp-quote (char-to-string char))))
(defun word-end-regex (char)
(if (string-match "\\s_\\|\\sw" (char-to-string char))
(concat (regexp-quote (char-to-string char)) "\\_>")
(regexp-quote (char-to-string char))))
(defun search-forward-after-char (char)
(re-search-forward (word-end-regex char)
(line-beginning-position 2)
t))
(defun search-backward-before-char (char)
(re-search-backward (word-start-regex char)
(line-beginning-position)
t))
(defun search-backward-after-char (char)
(forward-char -1)
(when (re-search-backward (word-end-regex char)
(line-beginning-position)
t)
(forward-char 1)
(point)))
(defun search-forward-before-char (char)
(forward-char 1)
(when (re-search-forward (word-start-regex char)
(line-beginning-position 2)
t)
(forward-char -1)
(point)))
(defun forward-after-char (char)
(interactive "cForward after char: ")
(goto-char-or-cancel
(if (search-forward-after-char char)
(point)
(beginning-of-line)
(search-forward-after-char char))))
(defun forward-before-char (char)
(interactive "cForward before char: ")
(goto-char-or-cancel
(if (search-forward-before-char char)
(point)
(beginning-of-line)
(search-forward-before-char char))))
(defun backward-before-char (char)
(interactive "cBackward before char: ")
(goto-char-or-cancel
(if (search-backward-before-char char)
(point)
(end-of-line)
(search-backward-before-char char))))
(defun backward-after-char (char)
(interactive "cBackward after char: ")
(goto-char-or-cancel
(if (search-backward-after-char char)
(point)
(end-of-line)
(search-backward-after-char char))))
You’re not the first Dragonfly user I’ve met who loves Emacs, but I’ve always been a little hesitant to learn it myself, I guess just because it’s a bit intimidating. From a voice programming perspective, what are the benefits versus other IDEs?
I could probably write a whole post on this; in fact I would like to write a post discussing more of my Emacs configuration. But here’s a summary: Emacs is extremely and easily configurable, which means you can quickly implement any feature you need in the editor, especially if it is just basic text manipulation. And Emacs’s modeless key bindings are perfect for voice recognition, where you have too many commands to map to regular keyboard shortcuts, and you don’t want the slowness of switching modes.
I think the friendliest way to start Emacs is to read the introductory blog entries on Mastering Emacs and check out the Emacs Prelude package which adds a bunch of functionality out-of-the-box and improves a lot of default options. Frankly, I think Emacs out-of-the-box has a lot of really basic usability problems, but fortunately these are nearly all fixable with just a little effort.
“This would work better if there were an Emacs extension that numbered incremental search results so it’s easy to jump to a particular one. Let me know in the comments if you make this!”
You’re looking for M-x occur… or Ace-Jump mode.
I’m aware of these options, but they don’t do quite what I want. My use case is when I’m doing a search for something offscreen, and then after the search has begun I see exactly the place I want to jump to. Similarly, I might scan all the matches on screen and want to see the next page of matches (hence skip forward by X matches).
Occur is really close, except that it opens another buffer which adds clutter, and I like seeing the results in full context. Incremental search is pretty much exactly what I want except I want it to be easier to jump around the matches.
I only use Ace-jump mode for moving to a visible place on the screen. Its particular good for moving to English words because you can keep your focus on the word.
I use occur for jumping within a file. I have a hacked version that just jumps immediately to a target if it only occurs once in the current buffer. Otherwise, you get one line per match normally. I haven’t tried, but I think there’s some way to get more context with extra lines before and after each match.
I do “teleport phrase ” then if needed “teleport N” where N is the line # mod N of the occurrence I want.
I have funkier commands like “teleport routine ” that use regular expressions to work with many programming languages.
I agree that neither of these is ideal for the use case you mention.
sigh. Stupid HTML commenting conventions. There is a <_anything> after “phrase” above so you can say things like “teleport phrase HTML”, for example.
Personally, I modify incremental search by adding some additional bindings so that it works correctly with the mini-buffer, the bell (important for exiting keyboard macros), and the mark. You want the point to not move if the search fails and to ignore any match at the starting point.
“If you use Emacs and go with the relative line numbers approach, check out the package linum-relative”
I find that nlinum is far superior to linum. Among other things, the Emacs maintainers stopped fixing bugs with linum because nlinum is so much better.
I’ve modified my version to show all but the last two digits in light gray so that it’s easy to pick out the line number mod 100 I use.
‘The main advantage to using modulo is that you can chain together successive commands easily (e.g. “select between line X and line Y”).’
This is hugely useful; I have an entire grammar devoted to manipulating ranges of commands. E.g.,
“yank 13, 10”: copy lines at 93..10 (remember its mod 100 so 10 can be after 93) and insert them at start-of-line containing point.
“grab 10, 13”: same but move lines instead of copying them
Other operators include sort, destory, copy or move to a crumb (can be in a different file), align on various characters, indent, justify, comment, uncomment, run marco on, shell command on, etc.
s/13/93/
Hi Mark, I came across this old but great comment of yours, and wanted to share recent work (not mine) that takes this idea to the next level:
https://marketplace.visualstudio.com/items?itemName=pokey.cursorless
When using relative numbers, how do you handle negative numbers without extra syllables? It seems like you would need to either double all your verbs or learn 50 new numbers…
I don’t have a ton of specialized commands for manipulating ranges; I generally prefer short commands that can be chained together in a single utterance. For the few commands I have, I use “up” and “down” variants, such as “move lines up/down [X]” and “copy lines up/down [X]” where X is an arbitrary number that defaults to 1.
Normal Emacs provides jump to end of next word going forwards and jump back to start of previous word going backwards. I strongly recommend adding jump to start of next word as well (I stole this from Vi). Also, keeping in mind that people have trouble quickly visually “counting” more than 5-7 items, it can be useful to jump to the end of the line and then move backwards by words/symbols/etc.
e.g., “go 14 last flee 3”
Nice info. Btw, how loud or quiet are those foot pedals that you use…say compared to a full sized mouse click? A post on your foot pedal setup, perhaps?
I bought some configurable pedals on Amazon, and they work well but they’re clack-clack loud, and they can easily disturb others around me…
Lately I haven’t been using my pedals. Either I use voice commands or I just hit page up/down on the keyboard (cheating, I know). But they are not particularly loud, especially compared to my talking. I work out of a small conference room to avoid disturbing others.
I am curious about the conceptual functionality/intent of your (Mark and James) commands in Emacs so that I can understand more about what functions are suited to voice (vs keyboard). Stallman or probably Greenberg once observed about Emacs that (I paraphrase) “the big lesson about Emacs was that the definition of an editor was the functionality (the macros) that it provided.” I think the same idea probably applies to voice editing – voice users probably need/want/use a slightly different set of functions than keyboard users. Is looking at people’s macro sets a good way to figure out what the general set of desired functionality is for voice users? Am I on the right or wrong track with this view?
Generally speaking, I think that the things people want to do with voice and keyboard are the same, at least for folks like me who use voice as a complete substitute for keyboard and mouse. So in that sense I wouldn’t say that the intent differs. Nevertheless, the question of what is more natural for each input method definitely differs. For example, its very natural on keyboard to tap a key rapidly a bunch of times to get somewhere in your document. This aligns with the ease of small muscle movements and the very fast feedback loop that comes with keypresses. In contrast, this is a very awkward interaction pattern for voice, where there is some delay after each command and it is much more tiring to say the same command many times. I think this tends to lead voice users to ask deeper questions about what their higher level intent is. So instead of “I want to get to that variable” it’s “I want to rename that variable”, for example. And you could certainly look at people’s macro sets to get an idea of what operations people have found useful. I think the most impressive modern example in the voice domain of moving around and operating on file is Cursorless for Visual Studio Code. My first reaction upon seeing it was that it builds on the very same ideas Mark shared here. If you want to delve into what this offers, you can read the docs or try it out and use the built-in help, because the docs are slightly outdated.
Thank you for your comments and link to the docs for Cursorless. Now I can understand what you’re doing in that demo video that you made! (Without the docs, hats, colors, syntax patterns, and examples in the first page or two of the doc, the video is inscrutable… 🙂 Cheers, Kevin
Happy to help! I can’t take credit for Cursorless though; that’s from other members of the Talon community and so is the demo.