Custom web commands with WebDriver

Once you’ve gotten used to the basic commands and extensions to control Google Chrome, you may start to hunger for a faster way to control websites you use frequently. Some sites have keyboard shortcuts you can bind easily, but others don’t. This post will describe how to set up commands to control any webpage.

Selenium WebDriver provides a powerful but simple API to control webpages. It is used primarily for automated testing, but it will work perfectly for our needs, with a few tweaks. Start by installing the Python bindings and ChromeDriver.

By default, WebDriver will create a new instance of Chrome with a special profile to run any WebDriver commands. This is great for sandboxed web testing, but not so great for controlling your existing Chrome sessions with all your custom extensions. Fortunately, you can configure Chrome to set up a debugging server by opening your Chrome shortcut properties and adding --remote-debugging-port=9222 to the end of the Target field, after the final quote. To start the server, first quit out of Chrome completely (Ctrl-Shift-Q) then reopen it with your custom shortcut. Note that closing all open windows does not adequately quit Chrome, and you may need to repeat this procedure if you restart your computer and Chrome starts automatically.

Next, we will add a few functions to one of your dragonfly modules to connect a WebDriver instance to your existing Chrome session:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

def create_driver():
    global driver
    chrome_options = Options()
    chrome_options.experimental_options["debuggerAddress"] = "127.0.0.1:9222"
    driver = webdriver.Chrome(CHROME_DRIVER_PATH, chrome_options=chrome_options)

def quit_driver():
    global driver
    if driver:
        driver.quit()
    driver = None

def test_driver():
    driver.get('http://www.google.com/xhtml');

You will need to replace CHROME_DRIVER_PATH with the path of the ChromeDriver executable you downloaded. Then go ahead and bind these functions to voice commands and try them out.

If all goes well, the test_driver function will navigate Chrome to the Google homepage. But there’s a problem that you may have noticed if you had multiple tabs open: it does not necessarily work on the currently active tab. WebDriver does provide a way to change the current tab it operates on, but it doesn’t provide a way to find which tab is active. Fortunately, we can query the chrome debugger API directly from Python and get this information. Add the following function to your module:

def switch_to_active_tab():
    tabs = json.load(urllib2.urlopen("http://127.0.0.1:9222/json"))
    # Chrome seems to order the tabs by when they were last updated, so we find
    # the first one that is not an extension.
    for tab in tabs:
        if not tab["url"].startswith("chrome-extension://"):
            active_tab = tab["id"]
            break
    for window in driver.window_handles:
        # ChromeDriver adds to the raw ID, so we just look for substring match.
        if active_tab in window:
            driver.switch_to_window(window);
            print "Switched to: " + driver.title.encode('ascii', 'backslashreplace')
            return

Now, try calling this function first thing in test_driver, and it should operate on the active tab. This technique doesn’t work perfectly when multiple windows are open, but it works most of the time. If you have a more robust solution, please let me know in the comments!

Navigating to Google isn’t terribly exciting, so let’s add something more useful. Add the following action class and voice binding:

class ClickElementAction(DynStrActionBase):
    def __init__(self, by, spec):
        DynStrActionBase.__init__(self, spec)
        self.by = by

    def _parse_spec(self, spec):
        return spec

    def _execute_events(self, events):
        switch_to_active_tab()
        element = driver.find_element(self.by, events)
        element.click()

... in your Chrome bindings ...

"search bar": ClickElementAction(By.NAME, "q"),

This handy shortcut will let you focus the Google search bar from the Google search results page, making it easy to edit your query.

WebDriver provides several ways of finding an element on a webpage, so you can reuse this action to create a shortcut for nearly any button or link on any webpage. For example, here is a binding that lets you expand all the messages in a Gmail conversation:

"expand all": ClickElementAction(By.XPATH, "//*[@aria-label='Expand all']"),

Check out the WebDriver docs for more ways of locating an element.

Of course, this is just the start of what you can do with WebDriver. You can also execute sequences of commands, even waiting for particular elements to appear in the page. Please post your favorite commands in the comments!

If you’d like to see how I integrate this with my voice commands, please check out my GitHub repository.

16 thoughts on “Custom web commands with WebDriver”

  1. Very interesting. I never thought of using a web driver to control a browser.
    That said, the only common thing I really never found a simple keyboard shortcut or mouse click for is starting and stopping embedded YouTube videos.

    1. Wait, I take that back. There was a code review system that I could never figure out how to select the line to comment on — it wanted a right-click on a non-hyperlink if I remember correctly.

  2. Hello James,

    Thank you for your blog, it’s very interesting.
    However, I’m very interested in connecting my selenium web driver directly to an existing chrome session (launched with–remote-debugging-port=9222 option). I’ve tried your code, but nothing happened for me…

    Do you have a special version of selenium or google chrome ?

    Any help would be appreciated.

    Thx

    Franck

    1. Hi Franck!

      I’m using standard Google Chrome and Python selenium. Try quitting out of chrome and then explicitly killing every chrome process that remains (typically the second step isn’t necessary though). Then start with the flag and try again. If it’s working you should be able to load http://localhost:9222/ from within Chrome. If that works but the Python code doesn’t, then it sounds like there is something wrong with the Python code. Here’s my up-to-date python code for this: https://github.com/wolfmanstout/dragonfly-commands/blob/master/_webdriver_utils.py

  3. Hi James !!

    Thank you for your fast answer.

    Below my context:
    * Google Chrome 48.0.2564.82
    * Python 2.7.6
    * ChromeDriver 2.20.353124
    *#26~14.04.1-Ubuntu

    Now, with your Git source code, it’s open a new Chrome window (I’m using Ubuntu) but nothing happened. Your program, on my computer, blocks on

    driver = webdriver.Chrome(“/usr/bin/google-chrome”, chrome_options=chrome_options).

    And yes, I can load http://localhost:9222/ on Chrome.

    I think I’m cursed.

    Thank you for your help.

    Have a nice day.

    Franck

    1. Sorry for the slow response, apparently my email notifications were broken 🙁

      What do you mean when you say you are running Ubuntu? Are you running Dragon on a virtual machine? If so you might have to configure port forwarding or similar in order to access the Chrome server.

      Also, if you have a firewall, try turning that off first.

  4. Thanks James!
    Do you know if there is a way to hide the console window when creating the web driver?
    As soon as a console window is in focus, my Dragon reacts very sluggishly.

    1. You can also try going to “tools”, “options”, “miscellaneous” in Dragon, then exclude this application from being “voice-enabled”.

      1. Thanks again!
        I didn’t know about this option. Maybe it also helps me with powershell windows.

        In the meantime I tried to start the chromeDriver as a service. This has its own option to hide the window (chromeDriverService.HideCommandPromptWindow). But I didn’t manage to keep the service going. It was always ended immediately after execution. So I rejected this solution.

        Now I just added the following as a workaround.

        driverwindow = win32gui.GetForegroundWindow()
        win32gui.ShowWindow(driverwindow, win32con.SW_MINIMIZE)

        (import win32gui, win32con before)

        Certainly not the most elegant solution, but it works for me.

        greetings
        Eddy

  5. Hey James,
    I found that once I started Chrome with –remote-debugging-port, I couldn’t sign into Google. Do you have the same problem and do you know how to work around it?

    1. That’s very odd, no I haven’t seen that! If you remove the flag does it start working again? I’ve been running with this flag for years and never had any problems.

      1. Yes, tried it yesterday with chrome and chromium. same behavior. as soon as I start the without the flag, I can sign in again.

        1. I wonder if this might be an abuse protection issue. I have noticed that I get a lot of aggressive captchas and I figure it must be some aspect of my setup. Maybe this is the root cause. I haven’t seen it with Google accounts specifically. Can you try to send feedback to Google about this via the page that’s giving you trouble? That should help them debug.

Leave a Reply

Your email address will not be published. Required fields are marked *

Markdown is supported. Make sure raw < and > are wrapped in code blocks. You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.