Once you’ve gotten used to the basic commands and extensions to control Google Chrome, you may start to hunger for a faster way to control websites you use frequently. Some sites have keyboard shortcuts you can bind easily, but others don’t. This post will describe how to set up commands to control any webpage.
Selenium WebDriver provides a powerful but simple API to control webpages. It is used primarily for automated testing, but it will work perfectly for our needs, with a few tweaks. Start by installing the Python bindings and ChromeDriver.
By default, WebDriver will create a new instance of Chrome with a special profile to run any WebDriver commands. This is great for sandboxed web testing, but not so great for controlling your existing Chrome sessions with all your custom extensions. Fortunately, you can configure Chrome to set up a debugging server by opening your Chrome shortcut properties and adding
--remote-debugging-port=9222 to the end of the Target field, after the final quote. To start the server, first quit out of Chrome completely (Ctrl-Shift-Q) then reopen it with your custom shortcut. Note that closing all open windows does not adequately quit Chrome, and you may need to repeat this procedure if you restart your computer and Chrome starts automatically.
Next, we will add a few functions to one of your dragonfly modules to connect a WebDriver instance to your existing Chrome session:
from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By def create_driver(): global driver chrome_options = Options() chrome_options.experimental_options["debuggerAddress"] = "127.0.0.1:9222" driver = webdriver.Chrome(CHROME_DRIVER_PATH, chrome_options=chrome_options) def quit_driver(): global driver if driver: driver.quit() driver = None def test_driver(): driver.get('http://www.google.com/xhtml');
You will need to replace CHROME_DRIVER_PATH with the path of the ChromeDriver executable you downloaded. Then go ahead and bind these functions to voice commands and try them out.
If all goes well, the test_driver function will navigate Chrome to the Google homepage. But there’s a problem that you may have noticed if you had multiple tabs open: it does not necessarily work on the currently active tab. WebDriver does provide a way to change the current tab it operates on, but it doesn’t provide a way to find which tab is active. Fortunately, we can query the chrome debugger API directly from Python and get this information. Add the following function to your module:
def switch_to_active_tab(): tabs = json.load(urllib2.urlopen("http://127.0.0.1:9222/json")) # Chrome seems to order the tabs by when they were last updated, so we find # the first one that is not an extension. for tab in tabs: if not tab["url"].startswith("chrome-extension://"): active_tab = tab["id"] break for window in driver.window_handles: # ChromeDriver adds to the raw ID, so we just look for substring match. if active_tab in window: driver.switch_to_window(window); print "Switched to: " + driver.title.encode('ascii', 'backslashreplace') return
Now, try calling this function first thing in test_driver, and it should operate on the active tab. This technique doesn’t work perfectly when multiple windows are open, but it works most of the time. If you have a more robust solution, please let me know in the comments!
Navigating to Google isn’t terribly exciting, so let’s add something more useful. Add the following action class and voice binding:
class ClickElementAction(DynStrActionBase): def __init__(self, by, spec): DynStrActionBase.__init__(self, spec) self.by = by def _parse_spec(self, spec): return spec def _execute_events(self, events): switch_to_active_tab() element = driver.find_element(self.by, events) element.click() ... in your Chrome bindings ... "search bar": ClickElementAction(By.NAME, "q"),
This handy shortcut will let you focus the Google search bar from the Google search results page, making it easy to edit your query.
WebDriver provides several ways of finding an element on a webpage, so you can reuse this action to create a shortcut for nearly any button or link on any webpage. For example, here is a binding that lets you expand all the messages in a Gmail conversation:
"expand all": ClickElementAction(By.XPATH, "//*[@aria-label='Expand all']"),
Check out the WebDriver docs for more ways of locating an element.
Of course, this is just the start of what you can do with WebDriver. You can also execute sequences of commands, even waiting for particular elements to appear in the page. Please post your favorite commands in the comments!
If you’d like to see how I integrate this with my voice commands, please check out my GitHub repository.