How Indeed Uses Proctor for A/B Testing

(Editor’s Note: This post is the second in a series about Proctor, Indeed’s open source A/B testing framework.)

Proctor at Indeed

In a previous blog post, we described the features and tools provided by Proctor, our open-source A/B testing framework. In this follow-up, we share details about how we integrate Proctor into Indeed’s development process.

Customized Proctor Webapp

Our internal deployment of the Proctor Webapp integrates with Atlassian JIRA, Subversion, Git, and Jenkins. We use JIRA for issue linking, various sanity checks, and automating issue workflow. For tracking changes over time, we use Subversion (for historical reasons — Git is also an option). We use Jenkins to launch test matrix builds, and the webapp integrates with our internal operational data store to display which versions of a test are in use in which applications.

proctorwebapp-history

Figure 1: Screenshot of a test definition’s change history in the Proctor Webapp

Issue tracking with JIRA

At Indeed, we track everything with JIRA issues, including changes to test definitions. Requests for new tests or changes to existing tests are represented by a custom issue type in JIRA that we called “ProTest” (short for “Proctor Test”). We track ProTest issues in the JIRA project for the application to which the test belongs. The ProTest issues also use a custom workflow that is tied into our deployment of the Proctor Webapp.

After accepting an assigned ProTest issue, the issue owner modifies the test definition using Proctor Webapp. When saving the changes, she must provide a ProTest issue key. Before committing to our Proctor test definition repository, the webapp first verifies that the ProTest issue exists and is in a valid state (for example, is not closed). The webapp then commits the change (on behalf of the logged-in user), referencing the issue key in the commit message.

After the issue owner has made all changes for a ProTest issue, the JIRA workflow is usually as follows:

  1. The issue owner resolves the issue, which moves to state QA Ready.
  2. A release manager uses Proctor Webapp to promote the new definition to QA. The webapp moves the issue state to In QA.
  3. A QA analyst verifies the expected test behavior in our QA environment and verifies the issue, which moves to state Production Ready.
  4. A release manager uses Proctor Webapp to promote the new definition to production, triggering worldwide distribution and activation of the test change within one or two minutes. The webapp moves the issue state to In Production.
  5. A QA analyst verifies the expected test behavior in production and moves the issue state to Pending Closure.
  6. The issue owner closes the issue to reflect that all work is complete and in production.

In cases where we are simply adjusting the size of an active test group, Proctor Webapp skips this process and automatically pushes the change to production.

Our QA team verifies test modifications because those modifications can result in unintended behavior or interact poorly with other tests. Rules in test definitions are a form of deployable code and need to be exercised to ensure correctness. The verification step gives our QA analysts one last chance to catch any unintended consequences before the modifications go live. Consider the case of this rule, intended to make a test available only to English-language users in the US and Canada:

    (lang=='en' && country=='US') || country=='CA'

The parentheses are in the wrong place, allowing French-language Canadians to see behavior that may not be ready for them. A developer forcing himself into the desired group might have missed this bug. When we catch bugs right away during QA, we avoid wasting the time it would take to notice that the desired behavior never made it to production.

Test definition files

We store test definitions in a single shared project repository called proctor-data. The project contains one file per test definition: test-definitions/<testName>/definition.json

Modifications to tests most often are done via the Proctor Webapp, which makes changes to the JSON in the definition file and commits those changes (on behalf of the logged-in user) to the version control repository.

The definition files are duplicated to two branches in proctor-data: qa and production.  When a test definition revision is promoted to QA, the entire test definition file is copied to the qa branch and committed (as opposed to applying or “cherry-picking” the diff associated with a single revision). Similarly, when a test definition revision is promoted to production, the entire file is copied to the production branch and committed. Since we have one file per test definition, this simple approach maintains the integrity of the JSON definition while avoiding merge conflicts and not requiring us to determine which trunk revision deltas to cherry pick.

Building and deploying the test matrix

Proctor includes a builder that can combine a set of test definition files into a single text matrix file, while also ensuring that the definitions are internally consistent, do not refer to undefined bucket values, and have allocations that sum to 1.0. This builder can be invoked directly from Java or via an Ant task or a Maven plugin. We build a single matrix file using a Jenkins job that invokes Ant in the proctor-data project. An example of building with Maven is available on github.

A continuous integration (CI) Jenkins job builds the test matrix every time a test change is committed to trunk. That matrix file is made available to applications and services in our CI environment.

When a release manager promotes a test change to QA, a QA-specific Jenkins job builds the test matrix using the qa branch. That generated matrix file is then published to all QA servers. The services and applications that consume the matrix periodically reload it. An equivalent production-specific Jenkins job handles new changes on the production branch.

Proctor in the application

Each project’s Proctor specification JSON file is stored with each project’s source code in a standard path (for example, src/main/resources/proctor). At build time, we invoke the code generator (via a Maven plugin or Ant task) to generate code that is then built with the project’s source code.

When launching a new test, we typically deploy the test matrix before the application code that depends on it. However, if the application code goes out first, Proctor will “fall back” and treat the test as inactive – if you follow our convention of mapping your inactive bucket to value -1.

You can change the fallback behavior by setting fallbackValue to the desired bucket value in the test specification. We follow the convention of falling back on the unlogged inactive group to help ensure that test and control groups do not change size unexpectedly. Suppose that you have groups 0 (control) and 1 (test) for a test that runs Monday-Thursday with fallback to group 0. If your test matrix is broken as a result of a change from Tuesday 2pm to Tuesday 5pm, summing your metrics across the whole period from Monday to Thursday will skew the results for the control group. If your fallback was -1 (inactive), there would be no skew for your control and test groups.

When adding a new bucket to a test, we typically take this sequence of actions:

  1. Deploy the test matrix with no allocation for the new bucket.
  2. Deploy the application code that is aware of the new bucket.
  3. Redeploy the matrix with an allocation for that bucket.

If the matrix is deployed with an allocation for a new bucket of which the application is unaware, Proctor errs on the side of safety by using the fallback value for all cases. We made Proctor work that way to avoid telling the application to apply an unknown bucket in some cases for some period of time, which could skew analysis.

We take similar precautions when deleting an entire test from the matrix.

Testing group membership, not non-membership

Proctor’s code generation provides easy-to-use methods for testing group membership. We have found it best to always use these methods to test for membership rather than non-membership. If you’ve made your code conditional on non-membership, you run the risk of getting that conditional behavior in unintended circumstances.

As an example, suppose you have a [50% control, 50% test] split, and in your code you use the conditional expression !groups.isControl(), which is equivalent to groups.isTest(). Then, to reduce the footprint of your test while keeping an equal-sized control group for comparison, you change your test split to [25% control, 50% inactive, 25% test]. Now your conditional expression is equivalent to groups.isTest() || groups.IsInactive(). That logic is probably not what you intended, which is to keep the same behavior for control and inactive. In this example, using groups.isTest() in the first place would have prevented you from introducing unintended behavior.

Evolving bucket allocations

We recognize that assigning users to test buckets may affect how the site behaves for them. Proctor on its own cannot ensure consistency of experience across successive page views or visits as a test evolves. When growing or shrinking allocations, we consider carefully how users will be affected. 

Usually, once a user is assigned to a bucket, we’d like for that user to continue to see the behavior associated with that bucket as long as that behavior is being tested. If your allocations started as [10% control, 10% test, 80% inactive], you would not want to grow to [50% control, 50% test], because users initially in the test bucket would be moved to the control bucket.

There are two strategies for stable growth of buckets. In the “split bucket” strategy (Figure 2), you add new ranges for the existing buckets, moving from 10/10 to 50/50 by taking two additional 40% chunks from the inactive range. The resulting JSON is shown in Figure 3.

split-bucket

Figure 2: Growing control and test by splitting buckets into multiple ranges

{
"allocations": [
{
"ranges": [
{
"length": 0.1,
"bucketValue": 0
},
{
"length": 0.1,
"bucketValue": 1
},
{
"length": 0.4,
"bucketValue": 0
},
{
"length": 0.4,
"bucketValue": 1
}
]
}
]
}

Figure 3: JSON for “split bucket” strategy; 0 is control and 1 is test

In the “room-to-grow” strategy, you leave enough inactive space between buckets so that you can adjust the size of the existing ranges, as in Figure 4.

room-to-grow

Figure 4: Growing control and test by updating range lengths to grow into the inactive middle

We use the “room-to-grow” strategy whenever possible, as it results in more readable test definitions, both in JSON and the Proctor Webapp.

Useful helpers

Proctor includes some utilities that make it easier to work with Proctor in web application deployments:

  • a Spring controller that provides three views: the groups for the current request, a condensed version of the current test matrix, and the JSON test matrix containing only those tests in the application’s specification;
  • a Java servlet that provides a view of the application’s specification; and
  • support for a URL parameter that allows you to force yourself into a test bucket (persistent via a browser cookie)

We grant access to these utilities in our production environment only to privileged IP addresses, and we recommend you do the same.

It works for Indeed, it can work for you

Proctor has become a crucial part of Indeed’s data-driven approach to product development, with over 100 tests and 300 test variations currently in production. To get started with Proctor, dive into our Quick Start guide. To peruse the source code or contribute your own enhancements, visit our github page.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on Google+Share on RedditEmail this to someone

Open-Source Interactive Data Analytics with Imhotep

We are excited to announce the open-source availability of Imhotep, the interactive data analytics platform that powers data-driven decision making at Indeed. When we test changes to our applications and services, whether to our user interface or our backend algorithms, we measure how those changes affect job seekers. We built Imhotep to allow our engineering and product organizations to focus on key metrics at scale.

Key features

The Imhotep platform and tools allow you to:

  • Perform fast, interactive, ad hoc queries and aggregate results for large datasets
  • Combine results from multiple time-series datasets
  • Build your own data tools for analysis, monitoring, reporting, and automated data processing on top of the Imhotep platform

At its core, Imhotep is a distributed inverted index on time-series data that runs across a cluster of servers. We’ve made it easy to set up an Imhotep cluster on Amazon Web Services (AWS). Once you’ve set up your cluster, you can upload your data and then interactively query that data using IQL, the Imhotep Query Language. The IQL web client enables you to answer all sorts of questions about your data, and iterate quickly on those questions to get to important insights.

For example, at Indeed, we use Imhotep to answer these and many more questions about how people around the world are using our job search engine:

  • How many unique job search queries were performed on a specific day in a specific country?
  • What are the top 50 queries in a specific country? How many times did job seekers click on a search result for each of those queries?
  • Which job titles have the highest click-through rate for the query “Architecture” in the US? Which titles have the lowest click-through rate?

Getting started with Imhotep

You can use our tools to configure your Imhotep cluster on AWS. These setup tools require that you have an AWS account, two S3 buckets for data storage, and your time-series data in TSV or CSV format for uploading into the system.

To learn more, read our Imhotep documentation. If you need help, you can ask questions in our Q&A forum for Imhotep.

To learn more about how we use Imhotep for analytics at Indeed, check out the video and slides of our tech talk from April 2014: Large-Scale Interactive Analytics with Imhotep. If you’re in Austin, join us for our upcoming Imhotep workshop on November 5, 2014.


UPDATE 11/18/2014: Slides and video from the workshop are now available.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on Google+Share on RedditEmail this to someone

Using Proctor for A/B Testing from a Non-Java Platform

We’re excited to announce the open sourcing of proctor-pipet, a tool we created that allows you to deploy Proctor as a remote service. proctor-pipet is a Java web application that exposes Proctor as a simple REST API accessible over HTTP. This means that you can do A/B testing in applications written in non-JVM languages like Python.

In addition to proctor-pipet, we have made available a Python package called django-proctor that makes it easy for your Django web app to use Proctor groups. We look forward to others implementing similar packages for their favorite web frameworks, such as Ruby on Rails or .NET MVC.

These packages are the result of some great work by one of our fantastic summer 2014 interns.

How it works

Your web application makes HTTP requests to Proctor through proctor-pipet. Proctor returns the group assignments, which your web app can then use to make decisions on the content it returns to the user’s browser.

data flow for proctor-pipet

Deploying Proctor remotely through proctor-pipet lets you take advantage of all the features of  the Proctor library:

  • Assign users to test groups
  • Use identifiers to map to different test types
  • Toggle features or implement gradual rollouts of new features
  • Make changes to test allocations independently of your code
  • Determine group membership based on rules that use arbitrary context variables (for example, to target mobile devices)

Proctor documentation is available here.

Download both tools on GitHub: proctor-pipet (https://github.com/indeedeng/proctor-pipet) and django-proctor (https://github.com/indeedeng/django-proctor). Both pages include documentation with examples to help you get started. If you have any questions, ask them in our Proctor Q&A forum.

Tweet about this on TwitterShare on FacebookShare on LinkedInShare on Google+Share on RedditEmail this to someone