Using Proctor for A/B Testing from a Non-Java Platform

Wednesday, Sep 3, 2014 by Parker Seidel

We’re excited to announce the open sourcing of proctor-pipet, a tool we created that allows you to deploy Proctor as a remote service. The proctor-pipet tool is a Java web application that exposes Proctor as a simple REST API accessible over HTTP. This means that you can do A/B testing in applications written in non-JVM languages like Python.

In addition to proctor-pipet, we have made available a Python package called django-proctor that makes it easy for your Django web app to use Proctor groups. We look forward to others implementing similar packages for their favorite web frameworks, such as Ruby on Rails or .NET MVC.

These packages are the result of some great work by one of our fantastic summer 2014 interns.

How it works

Your web application makes HTTP requests to Proctor through proctor-pipet. Proctor returns the group assignments, which your web app can then use to make decisions on the content it returns to the user’s browser.

data flow for proctor-pipet

Deploying Proctor remotely through proctor-pipet lets you take advantage of all the features of  the Proctor library:

  • Assign users to test groups
  • Use identifiers to map to different test types
  • Toggle features or implement gradual rollouts of new features
  • Make changes to test allocations independently of your code
  • Determine group membership based on rules that use arbitrary context variables (for example, to target mobile devices)

Proctor documentation is available here.

Download both tools on GitHub: proctor-pipet (https://github.com/indeedeng/proctor-pipet) and django-proctor (https://github.com/indeedeng/django-proctor). Both pages include documentation with examples to help you get started. If you have any questions, ask them in our Proctor Q&A forum.

Bug Bounty Program: Cash Rewards for Reported Vulnerabilities

Sunday, Aug 3, 2014 by Gregory Caswell

As part of Indeed’s focus on constantly improving how we help people get jobs, we are proud to announce the rollout of our bug bounty program. Through Bugcrowd, interested security professionals will now be able to disclose vulnerabilities and be rewarded for their efforts.

For every unique submission that leads to a code change, we will be paying between $50 and $1,500. The range is dependent on the type and severity of the vulnerability reported. To view everyone who has helped us so far, or just to see how you stack up against the competition, head on over to the Hall of Fame.

Full details on how you can help us improve our services (and get paid!) can be found here. Please keep in mind that attacks against the current user base are strictly prohibited, as are automated vulnerability scanners. Responsible pen testers should always minimize system degradation and impact against users.

Ready to get started? Sign up at Bugcrowd and join over 10,000 security researchers on the largest and most diverse security testing team in the world.

indeed    +  bugcrowd

Proctor: Indeed’s A/B Testing Framework

Friday, Jun 13, 2014 by Jack Humphrey

(Editor’s Note: This post is the first of a two-part series about Proctor, Indeed’s open source A/B testing framework.)

A/B Testing at Indeed

Indeed’s mission is to help people get jobs. We are always asking ourselves the question “What’s best for the jobseeker?” We answer that question by testing and measuring everything. We strive to test every new feature and every improvement in every product at Indeed, and we measure the impact of those changes to ensure they are helping us achieve our mission.

In October 2013, Tom Bergman and Matt Schemmel presented an @IndeedEng talk on Proctor, Indeed’s A/B testing framework. In that talk, we announced that we had made Proctor available open source. Since then, we have also open sourced Proctor Webapp, the web application that we use to manage Proctor test definitions.

In the October talk, Tom gave the example of a simple A/B test to determine if changing the background color of a button would improve user experience. Figure 1 shows control group A, in which we haven’t changed our Find Jobs button, and test group B, in which the button has a blue background.

simple-a-b-test

Figure 1: Testing an existing button treatment (A) against a version with a blue background (B)

As discussed in our logrepo talk and blog post, we log everything at Indeed so that we can analyze, learn, and improve our products. For this simple test, we logged the group (A or B) of each user visiting Indeed and the subsequent clicks. Then we used our analysis tools to determine that the test group led to more searches and greater overall user engagement.

The example above has one test behavior, but we typically try out multiple alternate behaviors in a given test. In this test, we would be likely to try more than one different background color.

We can also test multiple ideas at the same time, as in the example in Figure 2, in which one test is for the button text and the other is for the background color. Testing multiple variables (like text and color) for a particular area of functionality is known as “multivariate testing.”

multiple-variable-a-b-test

Figure 2: Running two tests on the same button simultaneously

We’ve been doing A/B testing at Indeed for years, and many of the lessons we learned informed the development of Proctor. The October talk covers in more detail Proctor’s design decisions and ways to use for it for more than just A/B testing. In this blog post, we focus on some of Proctor’s key features and concepts, and we explain the nuts and bolts of how we use Proctor at Indeed.

Proctor Features and Concepts

Standard representation

Proctor provides a standard JSON representation of test definitions and allows adjustments to those definitions to be deployed independently of code. We refer to the full set of test definitions as the test matrix. A test matrix can be distributed to multiple applications as a single file, allowing for greater agility when managing tests and for sharing of consistent test definitions across multiple applications. Figure 3 shows a very simple version of our button test, with 50% of users allocated to the control group A (bucket 0) and 50% to the test group B (bucket 1).

"buttontst": {
"description": "backgroundcolortest",
"salt": "buttontst",
"buckets": [
{
"name": "control",
"value": 0,
"description": "current button treatment (A)"
},
{
"name": "altcolor",
"value": 1,
"description": "test background color (B)"
}
],
"allocations": [
{
"ranges": [
{
"length": 0.5,
"bucketValue": 0
},
{
"length": 0.5,
"bucketValue": 1
}
]
}
],
"testType": "USER"
}

Figure 3: a simple Proctor test definition

To understand this example, here is a quick overview of some Proctor terminology:

  • Every test has a testType. The most common type is USER, meaning that we use a user identifier to map to a test variation. More on test types later.
  • Each test is made of an array of buckets and an array of allocations.
  • A bucket is a variation, or group, within a Proctor test definition. Each bucket has a short name, an integer value, and a human-friendly description.
  • An allocation specifies the size of the buckets as an array of ranges. Each range has a length between 0 and 1 and a reference to the bucketValue for the bucket. Ranges in an allocation must sum to 1. You can have more than allocation if you use rules (more about that later).

Proctor Webapp

Using the Proctor Webapp, you can manage and deploy test definitions from a web browser. You can customize the application in a number of ways, allowing integration with:

  • revision control systems for maintaining history of test changes,
  • issue tracking systems for managing test modification workflow, and
  • other external tools, such as build and deployment systems.

proctor-webapp-example

Figure 4: Screenshot of a test definition in the Proctor Webapp

Java code generation from JSON test specifications

Test specifications in Proctor are JSON files that are independent of the test definitions and allow applications to declare the tests and buckets of which they are aware. They can be used in the build process for Java code generation and at runtime to load the relevant subset of the test matrix.

Code generation is optional but provides compile-time type-safety, so you don’t have to litter your code with string literals containing test and bucket names. The generated classes also make it easier to work with tests in Java code and in template languages (figure 5 shows a JSP example). Furthermore, the generated Java objects enable serialization of test group membership into formats like JSON or XML.

<c:if test="${groups.buttontstAltColor}">
.searchBtn { background-color: #2164f3; color: #ffffff; }
</c:if>

Figure 5: Conditional CSS based on test group membership in a JSP template

Rule-based contextual allocation

Using Proctor’s rule definition language, your system can apply tests and test allocations by evaluating rules against runtime context. For example, you can define your entire test to only be available for a certain segment of users, or you can adjust the allocation of test groups depending on the segment. Your test could be 50% A and 50% B for users in one country, and 25% each A/B/C/D for users in all other countries. Rule-based group assignment allows for great flexibility in how you roll out and evaluate your tests.

"allocations" : [
{
"rule" : "'US' == country && 'en' == userLanguage",
"ranges": [
{
"length": 0.5,
"bucketValue": 0
},
{
"length": 0.5,
"bucketValue": 1
}
]
},
{
"rule" : null,
"ranges" : [
{
"length" : 1.0,
"bucketValue" : -1
}
]
}
]

Figure 6: 50/50 test for US English, test inactive (bucket -1) for everyone else

Payloads

The ability to attach data payloads to test groups in test definitions allows you to simplify your code. In figures 7 and 8, we demonstrate how the color being tested for the button can be specified as a payload in the test definition and accessed in the template. Although in this example the total amount of template code is not reduced, if you had multiple test variations, each with its own color, the use of payloads would result in fewer lines of code.

"buckets": [
{
"name": "control",
"value": 0,
"description": "current button treatment (A)",
"payload": {
"stringValue": "#dddddd"
}
},
{
"name": "altcolor",
"value": 1,
"description": "test background color (B)",
"payload": {
"stringValue": "#2164f3"
}
}
]

Figure 7: Attaching a data payload containing a color value to the test group B

<style>
.searchBtn { background-color: ${groups.buttontstPayload}; }
</style>

Figure 8: Using the data payload in CSS in a JSP template

Flexible test types

Proctor has a flexible concept of test types, allowing bucket determination to be based on user (typically defined by a tracking cookie value), account ID (which can be fixed across devices), email address, or completely random across requests. You can also extend Proctor with your own test types. Custom test types are useful, for example, when you want test group determination to be based on a context- or content-based attribute such as page URL or content category.

Unbiased, Independent Tests

To assign a bucket for a test, Proctor maps the input identifiers (e.g. user ID) to an integer value using a uniformly distributed hash function. The range assignments for a bucket determine the range of integers that define each bucket. Figure 9 shows a simple example with a 50/50 control/test distribution. Since the hash function is uniform, the distribution of bucket assignments should be unbiased.

hash-id-to-buckets

Figure 9: 50/50 control/test buckets mapped onto an integer range for use with hash function

Furthermore, Proctor tests are independent, meaning that group membership in one test has no correlation with membership in another. This independence is accomplished by assigning a different salt to each test. The salt is used along with the identifiers as input to the hash function. Including the salt in the test definition allows for two advanced features:

  1. You can intentionally align buckets in different tests (make them dependent) by sharing a salt (shared salts must start with “&”). In practice, we have very rarely seen the need to align two tests in this way.
  2. You can “shuffle” the distribution of a test by changing its salt, resulting in completely different output from the hash function. This shuffling can be used to reassure yourself that there is no accidental bias in a test.

Proctor at Indeed

Proctor has become a crucial part of Indeed’s data-driven approach to product development, with over 100 tests and 300 test variations currently in production. In our next post, we will provide more details on how we use Proctor at Indeed.