RPF: Google's Record Playback Framework
Thursday, November 17, 2011 13:34 PM
The idea is to just let users use the application in the browser, record their actions, and save them as a javascript to play back as a regression test or repro later. Like most test tools, especially code generating ones, it works most of the time but its not perfect. Po Hu had an early version working, and decided to test this out on a real world product. Po, the developer of RPF, worked with the chrome web store team to see how an early version would work for them. Why chrome web store? It is a website with lots of data-driven UX, authentication, file upload, and it was changing all the time and breaking existing Selenium scripts: a pretty hard web testing problem, only targeted the chrome browser, and most importantly they were sitting 20 feet from us.
Before sharing with the chrome web store test developer Wensi Liu, we invested a bit of time in doing something we thought was clever: fuzzy matching and inline updating of the test scripts. Selenium rocks, but after an initial regression suite is created, many teams end up spending a lot of time simply maintaining their Selenium tests as the products constantly change. Rather than simply fail like the existing Selenium automation would do when a certain element isn’t found, and require some manual DOM inspection, updating the Java code and re-deploying, re-running, re-reviewing the test code what if the test script just kept running and updates to the code could be as simple as point and click? We would keep track of all the attributes in the element recorded, and when executing we would calculate the percent match between the recorded attributes and values and those found while running. If the match isn’t exact, but within tolerances (say only its parent node or class attribute had changed), we would log a warning and keep executing the test case. If the next test steps appeared to be working as well, the tests would keep executing during test passes only log warnings, or if in debug mode, they would pause and allow for a quick update of the matching rule with point and click via the BITE UI. We figured this might reduce the number of false-positive test failures and make updating them much quicker.
We were wrong, but in a good way!
We talked to the tester after a few days of leaving him alone with RPF. He’d already re-created most of his Selenium suite of tests in RPF, and the tests were already breaking because of product changes (its a tough life for a tester at google to keep up with the developers rate of change). He seemed happy, so we asked him how this new fuzzy matching fanciness was working, or not. Wensi was like “oh yeah, that? Don’t know. Didn’t really use it...”. We started to think how our update UX could have been confusing or not discoverable, or broken. Instead, Wensi said that when a test broke, it was just far easier to re-record the script. He had to re-test the product anyway, so why not turn recording on when he manually verified things were still working, remove the old test and save this newly recorded script for replay later?
During that first week of trying out RPF, Wensi found:
- 77% of the features in Webstore were testable by RPF
- Generating regression test scripts via this early version of RPF was about 8X faster than building them via Selenium/WebDriver
- The RPF scripts caught 6 functional regressions and many more intermittent server failures.
- Common setup routines like login should be saved as modules for reuse (a crude version of this was working soon after)
- RPF worked on Chrome OS, where Selenium by definition could never run as it required client-side binaries. RPF worked because it was a pure cloud solution, running entirely within the browser, communicating with a backend on the web.
- Bugs filed via bite, provided a simple link, which would install BITE on the developers machine and re-execute the repros on their side. No need for manually crafted repro steps. This was cool.
- Wensi wished RPF was cross browser. It only worked in Chrome, but people did occasionally visit the site with a non-Chrome browser.
We performed a check of how RPF faired on some of the top sites of the web. This is shared on the BITE project wiki. This is now a little bit out of date, with lots more fixes, but it gives you a feel for what doesn’t work. Consider it Alpha quality at this point. It works for most scenarios, but there are still some serious corner cases.
Joe Muharsky drove a lot of the UX (user experience) design for BITE to turn our original and clunky developer and functional-centric UX into something intuitive. Joe’s key focus was to keep the UX out of the way until it is needed, and make things as self-discoverable and findable as possible. We’ve haven't done formal usability studies yet, but have done several experiments with external crowd testers using these tools, with minimal instructions, as well as internal dogfooders filing bugs against Google Maps with little confusion. Some of the fancier parts of RPF have some hidden easter eggs of awkwardness, but the basic record and playback scenarios seem to be obvious to folks.
RPF has graduated from the experimental centralized test team to be a formal part of the Chrome team, and used regularly for regression test passes. The team also has an eye on enabling non-coding crowd sourced testers generate regression scripts via BITE/RPF.
Please join us in maintaining BITE/RPF, and be nice to Po Hu and Joel Hynoski who are driving this work forward within Google.
GTAC Videos Now Available
Tuesday, November 15, 2011 15:31 PM
All the GTAC 2011 talks are now available at http://www.gtac.biz/talks and also up on You Tube. A hearty thanks to all the speakers who helped make this the best GTAC ever.
Enjoy!
ScriptCover makes Javascript coverage analysis easy
Tuesday, October 25, 2011 23:19 PM
By Ekaterina Kamenskaya, Software Engineer in Test, YouTube
Today we introduce the Javascript coverage analysis tool, ScriptCover. It is a Chrome extension that provides line-by-line Javascript code coverage statistics for web pages in real time without any user modifications required. The results are collected both when the page loads and as users interact with it. The tool reports details about total web page coverage and for each external/internal script, as well as annotated code sources with individually highlighted executed lines.
Main features:
- Report current and previous total Javascript coverage percentages and total number of instrumented code instructions.
- Report Javascript coverage per individual instruction for each internal and external script.
- Display detailed reports with annotated Javascript source code.
- Recalculate coverage statistics while loading the page and on user actions.
Here are the benefits of ScriptCover over other existing tools:
- Per instructions coverage for external and internal scripts: The tool formats original external and internal Javascript code from ‘<script>’ tags to ideally place one instruction per line and then calculates and displays Javascript coverage statistics. It is useful even when the code is compressed to one line.
- Dynamic: Users can get updated Javascript coverage statistics while the web page is loading and while interacting with the page.
- Easy to use: Users with different levels of expertise can install and use the tool to analyse coverage. Additionally, there is no need to write tests, modify the web application’s code, save the inspected web page locally, manually change proxy settings, etc. When the extension is activated in a Chrome browser, users just navigate through web pages and get coverage statistics on the fly.
- It’s free and open source!
Want to try it out? Install ScriptCover and let us know what you think.
We envision many potential features and improvements for ScriptCover. If you are passionate about code coverage, read our documentation and participate in discussion group. Your contributions to the project’s design, code base and feature requests are welcome!
Google Test Analytics - Now in Open Source
Wednesday, October 19, 2011 20:03 PM
The test plan is dead!
Well, hopefully. At a STAR West session this past week, James Whittaker asked a group of test professionals about test plans. His first question: “How many people here write test plans?” About 80 hands shot up instantly, a vast majority of the room. “How many of you get value or refer to them again after a week?” Exactly three people raised their hands.
That’s a lot of time being spent writing documents that are often long-winded, full of paragraphs of details on a project everyone already knows to get abandoned so quickly.
A group of us at Google set about creating a methodology that can replace a test plan -- it needed to be comprehensive, quick, actionable, and have sustained value to a project. In the past few weeks, James has posted a few blogs about this methodology, which we’ve called ACC. It's a tool to break down a software product into its constituent parts, and the method by which we created "10 Minute Test Plans" (that only take 30 minutes!)
Comprehensive
The ACC methodology creates a matrix that describes your project completely; several projects that have used it internally at Google have found coverage areas that were missing in their conventional test plans.
Quick
The ACC methodology is fast; we’ve created ACC breakdowns for complex projects in under half an hour. Far faster than writing a conventional test plan.
Actionable
As part of your ACC breakdown, risk is assessed to the capabilities of your appliciation. Using these values, you get a heat map of your project, showing the areas with the highest risk -- great places to spend some quality time testing.
Sustained Value
We’ve built in some experimental features that bring your ACC test plan to life by importing data signals like bugs and test coverage that quantify the risk across your project.
Today, I'm happy to announce we're open sourcing Test Analytics, a tool built at Google to make generating an ACC simple -- and which brings some experimental ideas we had around the field of risk-based testing that work hand-in-hand with the ACC breakdown.
Test Analytics has two main parts: first and foremost, it's a step-by-step tool to create an ACC matrix that's faster and much simpler than the Google Spreadsheets we used before the tool existed. It also provides visualizations of the matrix and risks associated with your ACC Capabilities that were difficult or impossible to do in a simple spreadsheet.
The second part is taking the ACC plan and making it a living, automatic-updating risk matrix. Test Analytics does this by importing quality signals from your project: Bugs, Test Cases, Test Results, and Code Changes. By importing these data, Test Analytics lets you visualize risk that isn't just estimated or guessed, but based on quantitative values. If a Component or Capability in your project has had a lot of code change or many bugs are still open or not verified as working, the risk in that area is higher. Test Results can provide a mitigation to those risks -- if you run tests and import passing results, the risk in an area gets lower as you test.
This part's still experimental; we're playing around with how we calculate risk based on these signals to best determine risk. However, we wanted to release this functionality early so we can get feedback from the testing community on how well it works for teams so we can iterate and make the tool even more useful. It'd also be great to import even more quality signals: code complexity, static code analysis, code coverage, external user feedback and more are all ideas we've had that could add an even higher level of dynamic data to your test plan.
You can check out a live hosted version, browse or check out the code along with documentation, and of course if you have any feedback let us know - there's a Google Group set up for discussion, where we'll be active in responding to questions and sharing our experiences with Test Analytics so far.
Long live the test plan!
Google JS Test, now in Open Source
Monday, October 17, 2011 17:38 PM
Features of Google JS Test include:
- Extremely fast startup and execution time, without needing to run a browser.
- Clean, readable output in the case of both passing and failing tests.
- An optional browser-based test runner that can simply be refreshed whenever JS is changed.
- Style and semantics that resemble Google Test for C++.
- A built-in mocking framework that requires minimal boilerplate code (e.g. no $tearDown or$verifyAll calls), with style and semantics based on the Google C++ Mocking Framework.
- A system of matchers allowing for expressive tests and easy to read failure output, with many built-in matchers and the ability for the user to add their own.

Take a BITE out of Bugs and Redundant Labor
Wednesday, October 12, 2011 16:45 PM
The Browser Integrated Testing Environment, or BITE, is an open source Chrome Extension which aims to fix the manual web testing experience. To use the extension, it must be linked to a server providing information about bugs and tests in your system. BITE then provides the ability to file bugs from the context of a website, using relevant templates.
When filing a bug, BITE automatically grabs screenshots, links, and problematic UI elements and attaches them to the bug. This gives developers charged with investigating and/or fixing the bug a wealth of information to help them determine root causes and factors in the behavior.
When it comes to reproducing a bug, testers will often labor to remember and accurately record the exact steps taken. With BITE, however, every action the tester takes on the page is recorded in JavaScript, and can be played back later. This enables engineers to quickly determine if the steps of a bug repro in a specific environment, or whether a code change has resolved the issue.
Also included in BITE is a Record/Playback console to automate user actions in a manual test. Like the BITE recording experience, the RPF console will automatically author javascript that can be used to replay your actions at a later date. And BITE’s record and playback mechanism is fault tolerant; UI automation tests will fail from time to time, and when they do, it tends to be for test issues, rather than product issues. To that end, when a BITE playback fails, the tester can fix their recording in real-time, just by repeating the action on the page. There’s no need to touch code, or report a failing test; if your script can’t find a button to click on, just click on it again, and the script will be fixed! For those times when you do have to touch the code, we’ve used the Ace (http://ace.ajax.org/) as an inline editor, so you can make changes to your javascript in real-time.
Check out the BITE project page at http://code.google.com/p/bite-project. Feedback is welcome at bite-feedback@google.com. Posted by Joe Allan Muharsky from the Web Testing Technologies Team (Jason Stredwick, Julie Ralph, Po Hu and Richard Bustamante are the members of the team that delivered the product).
Unleash the QualityBots
Thursday, October 06, 2011 22:25 PM
Are you a website developer that wants to know if Chrome updates will break your website before they reach the stable release channel? Have you ever wished there was an easy way to compare how your website appears in all channels of Chrome? Now you can!
QualityBots is a new open source tool for web developers created by the Web Testing team at Google. It’s a comparison tool that examines web pages across different Chrome channels using pixel-based DOM analysis. As new versions of Chrome are pushed, QualityBots serves as an early warning system for breakages. Additionally, it helps developers quickly and easily understand how their pages appear across Chrome channels.

QualityBots is built on top of Google AppEngine for the frontend and Amazon EC2 for the backend workers that crawl the web pages. Using QualityBots requires an Amazon EC2 account to run the virtual machines that will crawl public web pages with different versions of Chrome. The tool provides a web frontend where users can log on and request URLs that they want to crawl, see the results from the latest run on a dashboard, and drill down to get detailed information about what elements on the page are causing the trouble.
Developers and testers can use these results to identify sites that need attention due to a high amount of change and to highlight the pages that can be safely ignored when they render identically across Chrome channels. This saves time and the need for tedious compatibility testing of sites when nothing has changed.

We hope that interested website developers will take a deeper look and even join the project at the QualityBots project page. Feedback is more than welcome at qualitybots-discuss@googlegroups.com.
From Google Dev Day to STAR West
Monday, September 26, 2011 16:41 PM
Google Dev Days in Brazil and Argentina are over (sigh) and now I turn my attention to STAR West in Anaheim. Unfortunately, it is too late to register for my tutorials as I was informed both are sold out.
If you attend STAR, please take the time to say hello.
Announcing the Final GTAC Agenda
Monday, September 12, 2011 14:46 PM
The GTAC agenda is now finalized and available at: http://www.gtac.biz/agenda. Looking forward to seeing everyone there. Stay tuned to this blog for updates to any pre- and post- events.
The 10 Minute Test Plan
Thursday, September 01, 2011 21:28 PM
By James Whittaker
Anything in software development that takes ten minutes or less to perform is either trivial or is not worth doing in the first place. If you take this rule of thumb at face value, where do you place test planning? Certainly it takes more than 10 minutes. In my capacity as Test Director at Google I presided over teams that wrote a large number of test plans and every time I asked how long one would take I was told “tomorrow” or “the end of the week” and a few times, early in the day, I was promised one “by the end of the day.” So I’ll establish the task of test planning to be of the hours-to-days duration.
As to whether it is worth doing, well, that is another story entirely. Every time I look at any of the dozens of test plans my teams have written, I see dead test plans. Plans written, reviewed, referred to a few times and then cast aside as the project moves in directions not documented in the plan. This begs the question: if a plan isn’t worth bothering to update, is it worth creating in the first place?
Other times a plan is discarded because it went into too much detail or too little; still others because it provided value only in starting a test effort and not in the ongoing work. Again, if this is the case, was the plan worth the cost of creating it given its limited and diminishing value?
Some test plans document simple truths that likely didn’t really need documenting at all or provide detailed information that isn’t relevant to the day to day job of a software tester. In all these cases we are wasting effort. Let’s face facts here: there is a problem with the process and content of test plans.
To combat this, I came up with a simple task for my teams: write a test plan in 10 minutes. The idea is simple, if test plans have any value at all then let’s get to that value as quickly as possible.
Given ten minutes, there is clearly no room for fluff. It is a time period so compressed that every second must be spent doing something useful or any hope you have of actually finishing the task is gone. This was the entire intent behind the exercise from my point of view: boil test planning down to only the essentials and cut all fat and fluff. Do only what is absolutely necessary and leave the details to the test executors as opposed to the test planners. If I wanted to end the practice of writing test plans that don’t stand the test of time, this seemed a worthwhile exercise.
However, I didn’t tell the people in the experiment any of this. I told them only: here is an app, create a test plan in 10 minutes or less. Remember that these people work for me and, technically, are paid to do as I tell them. And, again technically I am uniquely positioned to begin termination procedures with respect to their Google employment. On top of that I am presuming they have some measure of respect for me, which means they were likely convinced I actually thought they could do it. This was important to me. I wanted them to expect to succeed!
As preparation they could spend some time with the app in question and familiarize themselves with it. However, since many of the apps we used (Google Docs, App Engine, Talk Video, etc.) were tools they used every week, this time was short.
So here's how the task progressed:
They started, did some work and when ten minutes passed I interrupted them. They stated they weren't done yet. I responded by telling them they were out of time, nice try, here's a different problem to work on. 10 minutes later, the same thing happened and I changed the problem again. They began working faster and trying different angles, things that were too time consuming or not worth the effort got jettisoned really quick!
In each case, the teams came up with techniques that helped speed things along. They chose to jot down lists and create grids over writing long paragraphs of prose. Sentences … yes, paragraphs … no. They wasted little time on formatting and explanations and chose instead to document capabilities. Indeed, capabilities or what the software actually does, were the one commonality of all the plans. Capabilities were the one thing that all the teams gravitated toward as the most useful way to spend the little time they were given.
The three things that emerged as most important:
1. Attributes the adverbs and adjectives that describe the high level concepts testing is meant to ensure. Attributes such as fast, usable, secure, accessible and so forth.
2. Components the nouns that define the major code chunks that comprise the product. These are classes, module names and features of the application.
3. Capabilities the verbs that describe user actions and activities.
None of the teams finished the experiment in the 10 minutes allotted. However, in 10 minutes they were all able to get through both the Attributes and Components (or things that served a similar purpose) and begin documenting Capabilities. At the end of an additional 20 minutes most of the experiments had a large enough set of Capabilities that it would have been a useful starting point for creating user stories or test cases.
Which, at least to me, made the experiment a success. I gave them 10 minutes and hoped for an hour. They had 80% of the work complete in 30 minutes. And really isn’t 80% enough? We know full well that we are not going to test everything so why document everything? We know full well that as we start testing, things (schedules, requirements, architecture, etc.) are going to change so insisting on planning precision when nothing else obeys such a calling for completeness seems out of touch with reality.
80% complete in 30 minutes or less. Now that’s what I call a 10 minute test plan!
Google Developer Day 2011
Friday, August 19, 2011 20:26 PM
By James Whittaker
Google Developer Day is gearing up for a fantastic fall season of tours that crawl the continents. And a surprise this year ... yours truly will be the keynote for the Developer Day in Sao Paulo Brazil and Buenos Aires Argentina in September.
Google Developer Day is a deep dive into the future of Web, Mobile and Cloud technologies crafted specifically for software engineering professionals. And this year we are adding the element of Social to tie it all together. Google+ is only the start.
If you are attending, please stop by and say hello!
Click here for more information about dates and agenda.
GTAC Speakers and Attendees Finalized
Thursday, August 18, 2011 20:29 PM
We've completed the agenda for GTAC 2011 and are in the process of notifying accepted speakers and attendees. Once we have firm accepts we'll be publicizing the agenda.
Pretotyping: A Different Type of Testing
Tuesday, August 16, 2011 18:03 PM
Have you ever poured your heart and soul and blood, sweat and tears to help test and perfect a product that, after launch, flopped miserably? Not because it was not working right (you tested the snot out of it), but because it was not the right product.
Are you currently wasting your time testing a new product or feature that, in the end, nobody will use?
Testing typically revolves around making sure that we have built something right. Testing activities can be roughly described as “verifying that something works as intended, or as specified.” This is critical. However, before we take steps and invest time and effort to make sure that something built right, we should make sure that the thing we are testing, whether its a new feature or a whole new product, is the right thing to build in the first place.
Spending time, money and effort to test something that nobody ends up using is a waste of time.
For the past couple of years, I’ve been thinking about, and working on, a concept called pretotyping.
What is pretotyping? Here’s a somewhat formal definition – the dry and boring kind you’d find in a dictionary:
Pretotyping [pree-tuh-tahy-ping], verb: Testing the initial appeal and actual usage of a potential new product by simulating its core experience with the smallest possible investment of time and money.
Here’s a less formal definition:
Pretotyping is a way to test an idea quickly and inexpensively by creating extremely simplified, mocked or virtual versions of that product to help validate the premise that "If we build it, they will use it."
My favorite definition of pretotyping, however, is this:
Make sure – as quickly and as cheaply as you can – that you are building the right it before you build it right.
My thinking on pretotyping evolved from my positive experiences with Agile and Test Driven Development. Pretotyping applies some of the core ideas from these two models and applies them further upstream in the development cycle.
I’ve just finished writing the first draft of a booklet on pretotyping called “Pretotype It”:
You can download a PDF of the booklet from Google Docs or Scribd.
The "Pretotype It" booklet is itself a pretotype and test. I wrote this first-draft to test my (possibly optimistic) assumption that people would be interested in it, so please let me know what you think of it.
You can follow my pretotyping work on my pretotyping blog.
Post content
Keynote Lineup for GTAC 2011
Monday, August 01, 2011 18:39 PM
By James Whittaker
The call for proposals and participation is now closed. Over the next few weeks we will be announcing the full agenda and notifying accepted participants. In the meantime, the keynote lineup is now locked. It consists of two famous Googlers and two famous external speakers that I am very pleased to have join us.
Opening Keynote: Test is Dead by Alberto Savoia
The way most software is designed, developed and launched has changed dramatically over the last decade – but what about testing? Alberto Savoia believes that software testing as we knew it is dead – or at least moribund – in which case we should stick a fork in it and proactively take it out of its misery for good. In this opening keynote of biblical scope, Alberto will cast stones at the old test-mentality and will try his darnedest to agitate you and convince you that these days most testers should follow a new test-mentality, one which includes shifting their focus and priority from “Are we building it right?” to “Are we building the right it?” The subtitle of this year’s GTAC is “cloudy with a chance of tests,” and if anyone can gather the clouds into a hurricane, it's Alberto – it might be wise to bring your umbrella.
Alberto Savoia is Director of Engineering and Innovation Agitator at Google. In addition to leading several major product development efforts (including the launch of Google AdWords), Alberto has been a lifelong believer, champion, innovator and entrepreneur in the area of developer testing and test automation tools. He is a frequent keynote speaker and the author of many articles on testing, including the classic booklet “The Way of Testivus” and “Beautiful Tests” in O’Reilly’s Beautiful Code. His work in software development tools has won him several awards including the 2005 Wall Street Journal Technical Innovator Award, InfoWorld’s Technology of the Year award, and no less than four Software Development Magazine Jolt Awards.
Day 1 Closer: Redefining Security Vulnerabilities: How Attackers See Bugs by Herbert H. Thompson
Developers see features, testers see bugs, and attackers see “opportunities.” Those opportunities are expanding beyond buffer overflows, cross site scripting, etc. into logical bugs (and features) that allow attackers to use the information they find to exploit trusting users. For example, attackers can leverage a small information disclosure issue in an elaborate phishing attempt. When you add people in the mix, we need to reevaluate which “bugs” are actual security vulnerabilities. This talk is loaded with real world examples of how attackers are using software “features” and information tidbits (many of which come from bugs) to exploit the biggest weakness of all: trusting users.
Dr. Herbert H. Thompson is Chief Security Strategist at People Security and a world-renown expert in application security. He has co-authored four books on the topic including, How to Break Software Security: Effective Techniques for Security Testing (with Dr. James Whittaker) and The Software Vulnerability Guide (with Scott Chase). In 2006 he was named one of the “Top 5 Most Influential Thinkers in IT Security” by SC Magazine. Thompson continually lends his perspective and expertise on secure software development and has been interviewed by top news organizations including CNN, MSNBC, BusinessWeek, Forbes, Associated Press, and the Washington Post. He is also Program Committee Chair for RSA Conference, the world’s leading information security gathering. He holds a Ph.D. in Applied Mathematics from Florida Institute of Technology, and is an adjunct professor in the Computer Science department at Columbia University in New York.
Day 2 Opener: Engineering Productivity: Accelerating Google Since 2006 by Patrick Copeland
Patrick Copeland is the founder and architect of Google's testing and productivity strategy and in this "mini keynote" he tells the story and relates the pain of taking a company from ad hoc testing practices to the pinnacle of what can be accomplished with a well oiled test engineering discipline.
Conference Closer: Secrets of World-Class Software Organizations by Steve McConnell
Construx consultants work with literally hundreds of software organizations each year. Among these organizations a few stand out as being truly world class. They are exceptional in their ability to meet their software development goals and exceptional in the contribution they make to their companies' overall business success. Do world class software organizations operate differently than average organizations? In Construx's experience, the answer is a resounding "YES." In this talk, award-winning author Steve McConnell reveals the technical, management, business, and cultural secrets that make a software organization world class.
Steve McConnell is CEO and Chief Software Engineer at Construx Software where he consults to a broad range of industries, teaches seminars, and oversees Construx’s software engineering practices. Steve is the author of Software Estimation: Demystifying the Black Art (2006), Code Complete (1993, 2004), Rapid Development (1996), Software Project Survival Guide (1998), and Professional Software Development (2004), as well as numerous technical articles. His books have won numerous awards for "Best Book of the Year," and readers of Software Development magazine named him one of the three most influential people in the software industry along with Bill Gates and Linus Torvalds.
How We Tested Google Instant Pages
Wednesday, July 27, 2011 22:35 PM
By Jason Arbon and Tejas Shah
Google Instant Pages are a cool new way that Google speeds up your search experience. When Google thinks it knows which result you are likely to click, it preloads that page in the background, so when you click the page it renders instantly, saving the user about 5 seconds. 5 seconds is significant when you think of how many searches are performed each day--and especially when you consider that the rest of the search experience is optimized for sub-second performance.
The testing problem here is interesting. This feature requires client and server coordination, and since we are pre-loading and rendering the pages in an invisible background page, we wanted to make sure that nothing major was broken with the page rendering.
The original idea was for developers to test out a few pages as they went.But, this doesn’t scale to a large number of sites and is very expensive to repeat. Also, how do you know what the pages should look like? To write Selenium tests to functionally validate thousands of sites would take forever--the product would ship first. The solution was to perform automated test runs that load these pages from search results with Instant Pages turned on, and another run with Instant Pages turned off. The page renderings from each run were then compared.
How did we compare the two runs? How to compare pages when content and ads on web pages are constantly changing and we don't know what the expected behavior is? We could have used cached versions of these pages, but that wouldn’t be the realworld experience we were testing and would take time setting up, and the timing would have been different. We opted to leverage some other work that compares pages using the Document Object Model (DOM). We automatically scan each page, pixel by pixel, but look at what element is visible at the point on the page, not the color/RGB values. We then do a simple measure of how closely these pixel measurements match. These so-called "quality bots" generate a score of 0-100%, where 100% means all measurements were identical.
When we performed the runs, the vast majority (~95%) of all comparisons were almost identical, like we hoped. Where the pages where different we built a web page that showed the differences between the two pages by rendering both images and highlighting the difference. It was quick and easy for the developers to visually verify that the differences were only due to content or other non-structural differences in the rendering. Anytime test automation scales, is repeatable, quantified, and developers can validate the results without us is a good thing!
How did this testing get organized? As with many things in testing at Google, it came down to people chatting and realizing their work can be helpful for other engineers. This was bottom up, not top down. Tejas Shah was working on a general quality bot solution for compatibility (more on that in later posts) between Chrome and other browsers. He chatted with the Instant Pages developers when he was visiting their building and they agreed his bot might be able to help. He then spend the next couple of weeks pulling it all together and sharing the results with the team.
And now more applications of the quality bot are surfacing. What if we kept the browser version fixed, and only varied the version of the application? Could this help validate web applications independent of a functional spec and without custom validation script development and maintenance? Stay tuned...
GTAC: Call for Team Attendance
Thursday, July 07, 2011 19:54 PM
Attending conferences can be a great way to network and learn new concepts. However, taking those concepts back to your office and trying to convince your team apply them can be daunting. In order to make GTAC attendees more successful at implementing what they learn at this conference we are going to give preference to teammates from the same company applying for attendance. Bring another developer or tester (or two or three) and attend as a team so you can discuss what you learn and experience, hopefully increasing your chances of putting it into practice when you return to work.
We're extending the deadline for attendees until the end of July to give you a chance to round up some teammates.
Google at STAR West 2011
Tuesday, June 28, 2011 21:29 PM
By James Whittaker
STAR West will feature something unprecedented this year: back-to-back tutorials by Googlers plus a keynote and track session.
The tutorials will be Monday October 3. I have the morning session on "How Google Tests Software" and my colleague Ankit Mehta has the afternoon session on "Testing Rich Internet AJAX-based Applications." You can spend the whole day in Google Test Land.
I highly recommend Ankit's tutorial. He is one of our top test managers and has spent years minding Gmail as it grew up from a simple cloud-based email system into the mass-scale, ubiquitous rich web app that it is today. Ankit now leads all testing efforts around our social offerings (which are already starting to appear). Anyone struggling to automate the testing of rich web apps will have plenty to absorb in his session. He's not spouting conjecture and generalities; he's speaking from the position of actual accomplishment. Bring a laptop.
Jason Arbon and Sebastian Schiavone are presenting a track talk on "Google's New Methodology for Risk Driven Testing" and will be demonstrating some of the latest tools coming out of Google Test Labs. Tools that were born of real need built to serve that need. I am expecting free samples! Jason was test lead for Chrome and Chrome OS before taking over Google Test Labs where incredibly clever code is woven into useful test tools. Sebastian is none other than my TPM (technical program manager) who is well known for taking my vague ideas about how things should be done and making them real.
Oh and the keynote, well that's me again, something about testing getting in the way of quality. I wrote this talk while I was in an especially melancholy mood about my place in the universe. It's a wake-up call to testers: the world is changing and your relevance is calling ... will you answer the call or ignore it and pretend that yesterday is still today?
Lessons in a 21st Century Tech Career: Failing Fast, 20% Time and Project Mobility
Thursday, June 23, 2011 21:05 PM
By James Whittaker
If your name is Larry Page, stop reading this now.
Let me first admit that as I write this I am sitting in a company lounge reminiscent of a gathering room in a luxury hotel with my belly full of free gourmet food waiting for a meeting with the lighthearted title "Beer and Demos" to start.
Let me secondly admit that none of this matters. It's all very nice, and I hope it continues in perpetuity, but it doesn't matter. Engineers don't need to be spoiled rotten to be happy. The spoiling of engineers has little to do with the essence of a 21st century tech career.
Now, what exactly does matter? What is the essence of a 21st century tech career that keeps employees loyal and engaged with productivity that would shame the most seasoned agile-ist? I don't yet have the complete story, but here are three important ingredients:
Failing Fast. Nothing destroys morale more than a death march. Projects going nowhere should do so with the utmost haste. The ability of a company to implode pet projects quickly correlates directly to a great place to work. Engineers working on these project gain not only valuable engineering experience, they experience first-hand the company's perception of what is important (and, in the case of their project, what is not important). It's a built-in lesson on company priorities and it ensures good engineers don't get monopolized by purposeless projects. You gotta like a company willing to experiment. You have to love a company willing to laugh at itself when the experiments don't pan out.
20% Time. Any company worth working for has any number of projects that are worth working on. It's frustrating for many super-sharding engineers to see cool work going on down the hall or in the next building and not being part of it. A day job that takes all day is tiresome. Enter 20% time, a concept meant to send a strong message to all engineers: you always have a spare day. Use it wisely.
Project Mobility. Staying fresh by changing projects is part of mobility. Continuous cycling of fresh ideas from new project members to existing projects is another part. The downside here is obviously projects with a steep learning curve but I scoff in the general direction of this idea. Whose fault is it when a wicked smart engineer can't learn the system fast enough to be useful in some (even a small) context? Only the weakest organization with the poorest documentation can use that excuse. The only good reason for keeping people on a project is because they have no desire to leave.
These three concepts are better than all the lounges and free food any company can provide. Here's an example, a real example, of how it worked recently for an employee I'll call Paul (because that happens to be his name!).
Paul joined Google a little over a year ago and spent two months on a project that was then cancelled. He learned enough to be useful anywhere but was new enough that he really didn't have great context on what project he wanted next. Solution: I assigned him to a project that was a good skill set match.
Less than a year later, his new project ships. He played an important role in making this happen but in that time he also realized that the role was leaning toward feature development and he was more interested in a pure test development role. However, he was steeped in post-ship duties and working on the next release. A cycle that, happily, can be broken pretty easily here.
Another project had a test developer opening that suited Paul perfectly. He immediately signed up for 20% on this new project and spent his 80% ramping down in his old project. At some point these percentages will trade places and he'll spend 20% of his time training his replacement on the old project. This is a friction-less process. His manager cannot deny him his day to do as he pleases and now he can spend his time getting off the critical path of his old project and onto the critical path of his new project.
Mobility means a constant stream of openings on projects inside Google. It also creates a population of engineering talent with an array of project experiences and a breadth of expertise to fill those positions. 20% time is a mechanism for moving onto and off of projects without formal permissions, interviews and other make-work processes engineers deplore.
Let's face it, most benefits are transient. I enjoy a good meal for the time it is in front of me. I enjoy great medical when I am sick. I appreciate luxury when I have time for it. Even my paycheck comes with such monotonous regularity that it is an expectation that brings little joy apart from the brief moment my bank balance takes that joyful upward tick. But if I am unhappy the rest of the day, none of those islands of pampering mean squat. Empower me as an engineer during the much larger blocks of my time when I am doing engineering. Feed my creativity. Remove the barriers that prevent me from working on the things I want to work on.
Do these things and you have me. Do these things and you make my entire work day better. This is the essence of a 21st century tech career: make the hours I spend working better. Anything more is so dot com.
Ok, Larry you can start reading again.
Introducing DOM Snitch, our passive in-the-browser reconnaissance tool
Tuesday, June 21, 2011 16:51 PM
By Radoslav Vasilev from Google Zurich
Every day modern web applications are becoming increasingly sophisticated, and as their complexity grows so does their attack surface. Previously we introduced open source tools such as Skipfish and Ratproxy to assist developers in understanding and securing these applications.
As existing tools focus mostly on testing server-side code, today we are happy to introduce DOM Snitch — an experimental* Chrome extension that enables developers and testers to identify insecure practices commonly found in client-side code. To do this, we have adopted several approaches to intercepting JavaScript calls to key and potentially dangerous browser infrastructure such as document.write or HTMLElement.innerHTML (among others). Once a JavaScript call has been intercepted, DOM Snitch records the document URL and a complete stack trace that will help assess if the intercepted call can lead to cross-site scripting, mixed content, insecure modifications to the same-origin policy for DOM access, or other client-side issues.
Here are the benefits of DOM Snitch:
- Real-time: Developers can observe DOM modifications as they happen inside the browser without the need to step through JavaScript code with a debugger or pause the execution of their application.
- Easy to use: With built-in security heuristics and nested views, both advanced and less experienced developers and testers can quickly spot areas of the application being tested that need more attention.
- Easier collaboration: Enables developers to easily export and share captured DOM modifications while troubleshooting an issue with their peers.
DOM Snitch is intended for use by developers, testers, and security researchers alike. Click here to download DOM Snitch. To read the documentation, please visit this page.
*Developers and testers should be aware that DOM Snitch is currently experimental. We do not guarantee that it will work flawlessly for all web applications. More details on known issues can be found here or in the project’s issues tracker.
Test Is Dead
Friday, June 17, 2011 17:55 PM

My earthly body casts no shadows
'Tis my thoughts and words that bring umbrage
that is shade to some
and darkness to others
Testivus was but a bit of child play
to appetize
At GTAC 2011
the greater truth
shall be revealed
Test is dead
And I the executioner
Alberto Savoia
VI.XVII.MMXI
The Way of Testivus
"Floating Alberto" photograph courtesy of Patrick Copeland
GTAC 2011 Keynotes
Thursday, June 16, 2011 18:56 PM
By James Whittaker
I am pleased to confirm 3 of our keynote speakers for GTAC 2011 at the Computer History Museum in Mountain View CA.
Google's own Alberto Savoia, aka Testivus.
Steve McConnell the best selling author of Code Complete and CEO of Construx Software.
Award winning speaker ("the Jon Stewart of Software Security") Hugh Thompson.
This is the start of an incredible lineup. Stay tuned for updates concerning their talks and continue to nominate additional speakers and keynotes. We're not done yet and we're taking nominations through mid July.
In addition to the keynotes, we're going to be giving updates on How Google Tests Software from teams across the company including Android, Chrome, Gmail, You Tube and many more.
Testing at the speed and scale of Google
Tuesday, June 14, 2011 18:27 PM
By Pooja Gupta, Mark Ivey and John Penix
Continuous integration systems play a crucial role in keeping software working while it is being developed. The basic steps most continuous integration systems follow are:
This works great while the codebase is small, code flux is reasonable and tests are fast. As a codebase grows over time, the effectiveness of such a system decreases. As more code is added, each clean run takes much longer and more changes gets crammed into a single run. If something breaks, finding and backing out the bad change is a tedious and error prone task for development teams.
Software development at Google is big and fast. The code base receives 20+ code changes per minute and 50% of the files change every month! Each product is developed and released from ‘head’ relying on automated tests verifying the product behavior. Release frequency varies from multiple times per day to once every few weeks, depending on the product team.
With such a huge, fast-moving codebase, it is possible for teams to get stuck spending a lot of time just keeping their build ‘green’. A continuous integration system should help by providing the exact change at which a test started failing, instead of a range of suspect changes or doing a lengthy binary-search for the offending change. To find the exact change that broke a test, we could run every test at every change, but that would be very expensive.
To solve this problem, we built a continuous integration system that uses dependency analysis to determine all the tests a change transitively affects and then runs only those tests for every change. The system is built on top of Google’s cloud computing infrastructure enabling many builds to be executed concurrently, allowing the system to run affected tests as soon as a change is submitted.
Here is an example where our system can provide faster and more precise feedback than a traditional continuous build. In this scenario, there are two tests and three changes that affect these tests. The gmail_server_tests are broken by the second change, however a typical continuous integration system will only be able to tell that either change #2 or change #3 caused this test to fail. By using concurrent builds, we can launch tests without waiting for the current build/test cycle to finish. Dependency analysis limits the number of tests executed for each change, so that in this example, the total number of test executions is the same as before.
Let’s look deeper into how we perform the dependency analysis.
We maintain an in-memory graph of coarse-grained dependencies between various tests and build rules across the entire codebase. This graph, several GBs in-memory, is kept up-to-date with each change that gets checked in. This allows us to transitively determine all tests that depend on the code modified in a given change and hence need to be re-run to know the current state of the build. Let’s walk through an example.
Consider two sample projects, each containing a different set of tests:
where the build dependency graph looks like this:
We will see how two isolated code changes, at different depths of the dependency tree, are analyzed to determine affected tests, that is the minimal set of tests that needs to be run to ensure that both Gmail and Buzz projects are “green”.
Case1: Change in common library
As soon as this change is submitted, we start a breadth-first search to find all tests that depend on it.
Use of smart tools and cloud computing infrastructure in the continuous integration system makes it fast and reliable. While we are constantly working on making improvements to this system, thousands of Google projects are already using it to launch-and-iterate quickly and hence making faster user-visible progress.
How Google Tests Software - Part Seven
Thursday, May 26, 2011 20:56 PM
By James Whittaker
The Life of a TE
The Test Engineer is a newer role within Google than either SWEs or SETs. As such, it is a role still in the process of being defined. The current generation of Google TEs are blazing a trail which will guide the next generation of new hires for this role. It is the process that is emerging as the best within Google that we present here.
Not all products require the services of a TE. Experimental efforts and early stage products without a well-defined mission or user story are certainly projects that won’t get a lot of TE attention. If the product stands a good chance of being cancelled (in the sense that as a proof of concept it fails to pass muster) or has yet to engage users or have a well defined set of features, testing it is largely something that should be done by the people developing it.
Even if it is clear that a product is going to get shipped, Test Engineers have little to do early in the development cycle when features are still in flux and the final feature list and scope is undetermined. Overinvesting in testing too early can mean a lot of things get thrown away. Likewise, early testing planning requires fewer test engineers than later cycle exploratory testing when the product is close to final form and the hunt for missed bugs has a greater urgency.
The trick in staffing a project with Test Engineers has to do with risk and return on investment. Risk to the customer and to the enterprise means more testing effort and requires more TEs. But that effort needs to be in proportion to the potential return. We need the right number of TEs and we need them to engage at the right time and with the right impact.
Once engaged, TEs do not have to start from scratch. There is a great deal of test engineering and quality-oriented work performed by SWEs and SETs which is the starting point for additional TE work. The initial engagement of the TE is to decide things such as:
· Where are the weak points in the software?
· What are the security, privacy, performance and reliability concerns?
· Do all the primary user scenarios work as expected? For all international audiences?
· Does the product interoperate with other products (hardware and software)?
· In the event of a problem, how good are the diagnostics?
All of this combines to speak to the risk profile of releasing the software in question. TEs don’t necessarily do all of this work, but they ensure that it gets done and they leverage the work of others is assessing where additional work is required. Ultimately, test engineers are paid to protect users and the business from bad design, confusing UX, functional bugs, security and privacy issues and so forth. At Google, TEs are the only people on a team whose full-time job is to look at the product or service holistically for weak points. As such, the life of a Test Engineer is much less prescriptive and formalized than that of an SET. TE’s are asked to help on projects in all stages of readiness: everything from the idea stage to version 8, or even watching over a deprecated or “mothballed” project. Often, a single TE will even span multiple projects particularly those with specialty type skills like security.
Obviously, the work of a TE varies greatly depending on the project. Some TE’s spend much of their time programming, much like an SET, but with more of a focus on end-to-end user scenarios. Other TE's take existing code and designs determine failure modes and look for errors that will cause those failures. In such a role a TE might modify code but not create it from scratch. TE's must be more systematic and thorough in their test planning and completeness with a focus on the actual usage and system experience. TE's excel at dealing with ambiguity in requirements and at reasoning and communicating about fuzzy problems.
Successful TEs accomplish all this while navigating the sensitivities and sometimes strong personalities of the development and product team members. When weak points are found, test engineers happily break the software, and drive to get these issues resolved with the SWEs, PMs, and SETs.
Such a job description is a frightening prospect given the mix of technical skill, leadership, and deep product understanding and without proper guidance it is a role in which many would expect to fail. But at Google a strong community of test engineers has emerged to counter this. Of all job functions, the TE role is perhaps the best peer supported role in the company and the insight and leadership required to perform it successfully means that many of the top test managers in the company come from the TE ranks.
There is a fluidity to the work of a Google Test Engineer that belies any prescriptive process for engagement. TE’s can enter a project at any point and must assess the state of the project, code, design, and users quickly and decide what to focus on first. If the project is just getting started, test planning is often the first order of business. Sometimes TEs are pulled in late in the cycle to evaluate whether a project is ready for ship or if there are any major issues before an early ‘beta’ goes out. If they are brought into a newly acquired application or one in which they have little prior experience, they will often start doing some exploratory testing with little to no planning. Sometimes projects haven’t been released for quite a while and just need some touchups/security fixes, or UX updates—calling for an even different approach. One size rarely fits all for TEs at Google.
GTAC 2011 Open for Submission
Monday, May 23, 2011 20:48 PM
By James Whittaker
I am happy to announce that GTAC 2011 is now open for nominations. We're going to try and have an executive session, depending on interest, the afternoon/evening of October 25th at the Googleplex in Mountain View. This session is intended for top testing decision makers at top web, technology and software companies worldwide. It will be a chance for frank and open discussion about ours and the industry's collective challenges. It's intended to be a meeting of key decision makers and budget owners to share information, ideas and with a little luck spur some collaborations that will be good for the testing industry overall. Nominate your executive here.
The general session is by invitation only and prospective attendees and speakers must register and be selected. Speaker nominees are encouraged to point us to videos of prior presentations and any other material to help make our decision easier.
Please leave comments if there is some technology, tool or product you want to hear about so we end up with the best possible agenda.
I hope to see a lot of our readers in Mountain View in October!
How Google Tests Software - A Break for Q&A
Wednesday, May 04, 2011 22:28 PM
By James Whittaker
New material for the this series is coming more slowly. I am beginning to get into areas where I want to start posting screen shots of internal Google tools and describe how our infrastructure works. This is material that takes longer to develop and also requires some scrutiny before being published externally. So in the meantime, I am pausing to answer some of the questions you've posted in the comments.
I am going to start with Lilia (because she likes Neil Young mainly, but also because she can run further than me and those two things combine to impress me to no small end) who asks about SET-SWE conversion and vice-versa and which I have seen the most. There is also the broader question of whether there is a ceiling on the SET career path.
SETs and SWEs are on the same pay scale and virtually the same job ladder. Both roles are essentially 100% coding roles with the former writing test code and the latter doing feature development. From a coding perspective the skill set is a dead match. From a testing perspective we expect a lot more from SETs. But the overlap on coding makes SETs a great fit for SWE positions and vice versa. Personally I think it is a very healthy situation to have conversions. Since I have both roles reporting to me I can speak from first hand experience that many of my best coders are former SETs and some of my best testers are former SWEs. Each is excellent training ground for the other. On my specific team I am even on the conversions from one role to the other. But I suspect that Google-wide there are more SETs who become SWEs.
Why convert in the first place? Well at Google it isn't for the money. It also isn't for the prestige as we have a lot more SWEs than SETs and it is a lot harder to standout. The scarcity of our SETs creates somewhat of a mystique about these folk. Who are these rare creatures who keep our code bases healthy and make our development process run so smoothly? Actually, most SWEs care more about making the SETs happy so they continue doing what they do. Why would any dev team force a conversion of a great developer from SET to SWE when finding a suitable SET replacement is so much harder than adding another feature developer? SWEs ain't that stupid.
Now pausing before I take another hit of the corp kool-aid, let me be honest and say that there are far more senior SWEs than SETs. Percentage wise we test folk are more outnumbered at the top of the org than at the middle and bottom. But keep in mind that developers have had a large head start on us. We have developers who have been at Google since our founding and testers ... well ... less time than that.
Where do TEs fit into this mix? TE is an even newer role than SET but already we have a number climbing to the Staff ranks and pushing on the senior most positions in the company. There is no ceiling, but the journey to the top takes some time.
Raghev among others has asked about the career path and whether remaining an IC (individual contributor) is an option over becoming a manager. I have mixed feelings about answering this. As a manager myself, I see the role as one with much honor and yet I hear in your collective voices a hint of why do I have to become a manager? Ok, I admit, Dilbert is funny.
For me, being a manager is a chance to impart some of my experience and old-guy judgement on less experienced but more technically gifted ICs. The combination of an experienced manager's vision and an ICs technical skill can be a fighting force of incredible power. And yet, why should someone who does not want to manage be forced to do so in order to continue their career advancement?
Well, fortunately, Google does not make us choose. Our managers are expected to have IC tasks they perform. They are expected to be engaged technically and lead as opposed to just manage. And our ICs are expected to have influence beyond their personal work area. When you get to the senior/staff positions here you are a leader, period. Some leaders lead more than they manage and some leaders manage more than they lead.
But either way, the view from the top means that a lot of people are looking to you for direction ... whether you manage them or not.









