A/B Test Hypotheses are Broken, Here’s What We’re Using Instead

Last modified July 7, 2020 By (Follow on ) No Comments

The long standing advice on running A/B tests has been to base every test on a hypothesis, but we, as an A/B testing agency, are no longer doing this.

On paper, basing A/B tests on hypotheses seems logical and reasonable. But over years of running these tests on ecommerce and SaaS websites, we’ve noticed that basing website A/B tests on hypotheses creates bias from the team running and monitoring the test results, and this bias has negative consequences.

Specifically:

  1. Consequence #1: It can lead to stopping tests too early and thinking a variation won when it didn’t. Teams may then implement losing variations that hurt conversion rate.
  2. Consequence #2: Hypothesis-bias leads teams to gloss over A/B test results too quickly, thereby dramatically reducing learning, which is the most vital long term benefit of A/B testing.

To avoid those two consequences, we, as a CRO (conversion rate optimization) agency, are no longer basing our client’s tests off of hypotheses. Instead, we’re formulating all A/B tests around a series of questions.

This is a subtle but powerful tweak to hypothesis-based testing, and we’re calling it The Question Mentality for A/B testing. This name was carefully chosen to show that it involves changing a team’s entire mentality around AB testing.

question mentality vs ab test hypothesis method

As we argue below, the Question Mentality avoids the outcome bias of hypothesis-based AB testing and therefore:

  1. Reduces the risk of stopping tests too early
  2. Reduces employees or consultants tying their career reputations to the outcome of tests
  3. Dramatically increases the amount of learning and understanding that the company gets from every AB test

So far, for us, it has resulted in a wonderful mental shift in how we and our clients view each and every AB test. And we’ve enjoyed the benefits of the Question Mentality while still having the goal of increasing conversion rate with every test. We just do it in a way that gives us a far richer understanding of what’s really happening so that we learn more from each test and thereby have a higher shot of running more winning tests in the future.

In this article, we’ll discuss:

  • How hypothesis-based A/B testing leads to bias
  • The negative consequences of that bias (how it hurts your long term conversion rate goals)
  • How the Question Mentality can solve these challenges (and yield far richer learnings about customers’ desires, preferences, the friction they encounter on your site)
  • Case studies of A/B tests we’ve run for ecommerce clients and how the question mentality improves our understanding

If you’re an ecommerce company interested in working with us for A/B testing and CRO, you can learn more about our service and contact us here.

Hypothesis-Based Testing Leads to Bias

First, let us show you how A/B testing hypotheses in a marketing setting (e.g. testing a website) lead to bias, and then the consequences of that bias.

To quickly define it, a hypothesis is a prediction of the result of an A/B test. Something like this: “If A happens, it’ll result in B because of C.

When teams base an A/B test on a hypothesis, a bias emerges: they take on a mentality that they’re betting on one outcome over another. This is natural, you’ve stated what you think will happen, so there is some confirmation bias in wanting it to happen.

This is true even if your hypothesis is a “good hypothesis”, which many people in A/B testing circles define as being backed by evidence or data such as customer survey results, user testing, heatmaps, or analytics. Good hypotheses will cause bias just the same as bad ones — perhaps even more so because the evidence and strategy behind it will lead teams to be all the more convinced their hypothesis “just has to be true”.

But, in business settings, that bias is significantly greater than in a science lab. In a lab, (where A/B testing originated) scientists aren’t going to make more money if their hypothesis is true or false (at least not directly) — they are, if they are doing science properly, just after the truth.

But in website A/B or “split” testing, there is extreme bias in one result over another because there is both (1) money and (2) career reputation at stake.

Founders or owners have bias because a “winning” A/B test makes them more money. This is a totally natural and understandable bias, which they can’t be faulted for. But it has consequences that we’ll discuss below that can hurt the business.

In-house employees have bias because A/B tests where their hypothesis wins makes them look good in front of managers and executives, because it’s seen that their idea “made the company more money”. It helps their career if “their” test “wins”.

Marketing agencies (like us) have bias because of the same reasons: it makes us look good and it makes our clients more money. In fact many CRO agencies’ websites directly state a promise of “making more money” for clients:

pasted image 0 4

So, in every A/B test, the in-house team or the agency is heavily biased to wanting their variation to have a higher conversion rate than the original because it makes everyone look good. We’ve seen, and felt, this bias first hand.

I want to emphasize that this bias is natural: growing revenue is obviously the goal (including for us and our clients).

But this bias is dangerous, and can counterproductively result in hurting the company’s conversion rate and thus long term revenue potential.

Let’s see these consequences in action with an example.

Consequence #1: Hypothesis Bias Leads to Stopping Tests Too Early

The first consequence that this bias causes is a mental and emotional pull to stop tests too early.

Let’s look at a hypothetical scenario that is identical to what we’ve seen happen countless times across many clients over the years.

Say we are working on Zappos’ product detail page (PDP) and someone has a hypothesis that adding a video walkthrough of their shoes will increase conversion rate because it will let users see the shoes “in action” and be closer to the experience of really “feeling” the shoes:

pasted image 0 7

Let’s assume the above screenshot is the variation, where we add a video to the image carousel, and the test is run against a control where that video doesn’t exist.

Let’s say the team is really excited about it. Maybe a few people on the team have been wanting to place videos on PDPs for years, advocating for it, and went to great lengths to record some awesome videos for top selling products. The creative team was involved. They hired a video crew. It was a big deal.

And everyone on the ecommerce team is really convinced that the videos have to increase conversion rate. They say things like:

  • “Oh, this is for sure going to win.”
  • “At the very least it’ll break even, there’s certainly no way it will hurt”
  • “Why wouldn’t this increase conversion rate? It wouldn’t make any sense!”

All of these are reasonable, understandable expectations. But we don’t run AB tests because we’re sure of the result. We run AB tests because we aren’t sure.

So the test begins.

One week of data is collected. And the result looks like this:

pasted image 0 2

4.86% increase! Yes, the statistical significance is only 71%, so it’s not a statistically significant result, but what a nice increase. Surely the team’s hypothesis is being confirmed and this test is bound to win, right? Why waste time running the test for longer?

If the company is doing $20 million a year off of sales from the PDP and now the variation is producing 5% more sales from the PDP, if all else is equal that’s $1,000,000 per year in extra revenue from this test.

It is very tempting in this scenario to stop the test and quickly implement the variation.

It’s also the wrong move.

We’ve lived through this scenario so many times. Multiple folks from the client’s team suggest: Ok this is obviously going to be a winner as we expected, let’s stop the test and implement the variation. Why wait to get more data? We were all expecting this to win, obviously it’s on it’s way to winning, let’s implement it.

In fact, one time we even saw old AB test reports that a client’s previous CRO firm had prepared and that firm had stopped a test at 70%-ish statistical significance and said something to the effect of: “Normally we’d wait for significance but since this is so obviously winning, we’re stopping the test and we recommend that you implementing it.”

This is extremely dangerous.

Stopping an AB test requires two criteria: statistical significance and enough visitors, aka sample size (we discuss this in more detail in this article). If you consistently stop tests too early because folks are “really sure” the variation is “going to be” a winner, you will inevitably end up declaring winners that aren’t actually winners.

That means, worst case, you could actively hurt conversion rate. With more data, the variation could have been shown to be worse than the control. You just didn’t know. Now you’re changing the site for the worse and have no idea. More commonly though, you could simply implement a variation that in fact made no difference and think it’s making a huge difference.

Now in this Zappos hypothetical example, if you made this mistake, the team would spend massive resources making product videos thinking they’re helping when it may in fact be doing nothing, or worse, hurting.

But implementing the wrong variation is only the start of the issues that hypothesis bias creates in AB testing. The bigger issue, in our mind, is how it reduces learning. This has consequences way beyond that one particular test. Let’s explore this next.

Danger #2: Hypothesis Bias Prevents Getting Rich Learnings About Your Customers

Let’s continue the above example about adding a video to the PDP with the hypothesis that it will increase conversion rate by increasing product appeal.

pasted image 0 9

In our example, we pretended that the variation with the video was “in the lead” (showing a higher conversion rate than the control) early on, and the management team felt their hypothesis was confirmed and wanted to stop the test.

A common scenario.

What is equally common is that whenever teams end up stopping a test, they tend to make a conclusion without asking important questions about how users interacted with the change.

Questions like:

  • How many users actually watched the video?
  • How far into the video did they get?
  • Of only the users who watched the video, how much did their conversion rate increase vs. the users who didn’t watch the video?
  • Did the video also increase add to cart clicks?
  • Do customers watch it more when the PDP is their landing page vs when they start on another page on the site?

And those are just the basic questions.

There are more advanced questions to ask, like: How does the answer to these questions change by product?

  • Do these videos affect higher priced products more than lower priced?
  • Do they affect certain categories like running shoes above more than dress shoes (where style is more important than function/flexibility)?
  • Does it affect men’s shoes vs. women’s shoes the same?

Hypothesis bias leads teams to not explore these questions and just quickly call a test a winner or loser based on one metric (eg. checkout rate or, worse, add to carts) . Not caring about these nuances means teams are not really learning the real reasons for A/B test results. It means they’re not really understanding user preferences and behavior.

If you’re not learning the true preferences, desires, and fears of your online shoppers, you’re massively disadvantaged compared to your competition long term.

The Question Mentality: Base A/B Tests On a Series of Questions

Unlike a hypothesis like, “I think a video of the shoes on the PDP will increase conversion rate because…”, the Question Mentality would have us base this test on a series of tiered questions:

  1. Do users care about watching a video of someone talking about and playing with the shoes?
  2. Will they watch?
  3. Will it affect add to cart rates?
  4. Does it change how much other information they read on the PDP?
  5. Does it affect how many PDPs they look at before making a purchase? Because it could possibly give them more information and let them make a decision more quickly?
  6. Can we sell more expensive products better with videos?
  7. And thus does it move the needle on ultimate conversion rate at all?

Look at the difference!

On one side you have a single prediction of outcome, which pulls you, psychologically and emotionally, to having tunnel vision on one metric and judging success by that one metric.

On the other side you are setting up the test on a series of important questions about your users’ psychology and preferences. This pulls you towards wanting to set up the test in a way that can help you understand so much about your customers and their desires.

What that results in is:

  1. More goals are measured at the start of the test, giving you a far richer picture of the customer’s preferences. How many times have you run an A/B test, the results were not clear cut, and you end up asking follow up questions that you don’t have the answer to: “Wait so did anyone click on/watch/interact with this element?” No one knows. When you start a test with questions instead of a hypothesis, you set up goals to answer these questions.
  2. The team is led towards finding the truth, not catering to a bias. Think of the difference in your mentality when you read that question list about this simple video test above. What does it make you want? It makes you want to know the answers! That’s very different from stating “I think the video will increase conversion rate,” which makes you want a single outcome to come true.
  3. You no longer think about a test being a “loser”, which is a limited and incorrect view of A/B testing. When you have a question list like the one above, how could a test be a “loser”? No matter the outcome, the answers to the questions will be so educational, and that education can lead to many subsequent winning tests.

Example Differences between Hypothesis-Based Testing and The Question Mentality

Every AB test in the world can only have 3 outcomes:

  • A wins
  • B wins
  • No difference

Hypothesis-based testing makes only the second outcome interesting. The Question Mentality makes all 3 of those outcomes interesting because you learn from each one and that learning can lead to future winning tests.

If the Video Reduces Conversion Rate

For example, if adding the video results in the surprising outcome of reducing conversion rate, that could be labeled as a “loser” in the traditional approach. But the answers to the questions in the Question Mentality approach would be fascinating. Perhaps you’ll discover that a good chunk of your visitors did watch the video, but that led to them not viewing as many different PDPs, and perhaps that leads to them not seeing as many different shoe options and thus be less likely to find the one they love.

If you can confirm this with future tests, this tells you something extremely important about your Zappos customers: They need to see as many PDPs as possible to find the perfect product and thus be extremely likely to buy it. You can then design future tests to try to increase the number of PDPs (product pages) the customers see and see if perhaps that increases conversion rate.

If the Video Results In No Difference

If the video results in no difference (in statistics speak, this means the “null hypothesis” is true) , you would use your questions to explore in more detail what happened. Perhaps, as we have in past tests with videos on PDPs, you learn that most users don’t watch videos that are hidden inside image carousels.

pasted image 0 12

So you notice that barely anyone clicked on the video, so you can’t make a conclusion on whether the video helps or not, so you can follow up by testing placing the video somewhere else on the page where they are more likely to see it.

We have seen that videos below the image carousel can increase views significantly and also increase conversion rate in some cases.

And there are many more examples: you could see that videos are more popular on mobile but less so on desktop. You could discover that videos on certain products seem to help whereas on others it doesn’t. The list goes on.

All of these learnings can be exploited via follow up tests that can increase conversion rate as a result of the learnings from this test.

That is strategic AB testing. Not just make a guess, test it, and moving on.

If you simply used the traditional hypothesis based approach, shrugged and said “Oh well, I guess it didn’t work” and moved on to the next test, think about how much you’d miss. You’d miss insights that could set up a series of winning tests in the future.

Question Mentality Case Studies

Finally, Let’s look at a couple of tests we’ve run for clients under the lens of the Question Mentality to see real examples of the value it’s created.

Testing a Customer Quiz on the Homepage

A client of ours in the food and supplements industry wanted to test having a customer quiz on their homepage to help customers determine which of their products are best for them.

The customer was asked to select their goal:

B Quiz

And based on what they chose, we presented products that were best to hit that goal.

The hypothesis method would frame this test in a simple way: The quiz will increase conversion rate by helping customers better find the right products that better align with their goals.  

Framed this way you’re likely to measure and focus on just one conversion goal: increase conversion rate. (Or worse, clicks to the product pages as your goal).

But when analyzed via The Question Mentality, a much richer experience emerges. Think of all the questions you can ask with this test:

  1. Do customers interact with the quiz?
  2. Do customers add to cart more?
  3. Are they more likely to checkout because of this?
  4. Which goals are most popular?
  5. Does it change which products they gravitate towards?
  6. Does it change revenue or AOV?
  7. How do the results differ by source? Do visitors coming form Google interact with the quiz differently than returning customers?

And those are just the basic tier of questions. An advanced ecommerce CRO team can ask even more nuanced questions like:

Do the most popular goals differ by customer demographics (gender, age, location) and therefore does personalizing the way we present those goals change any of the above questions and outcomes?

Mindset Shift

Stop and think about the mindset shift that happens to the entire ecommerce team when this same test is viewed in the lens of the Question Mentality versus the simple hypothesis above.

When you ask all of those questions about this test, you approach the test more objectively, like a curious scientist whose goal is to understand the behaviors, preferences, and desires of their customers. Simply asking the questions changes the team’s psychology from “So did it win?” to  “How does this affect the customer?” and “What does the customer want?” and “How can our site better serve them?”

You are also more likely to measure multiple goals instead of just one, to help you answer all of those questions.

Results of the Test

This test, it turns out, did have interesting results. The final ecommerce conversion rate did not change at all by adding the quiz.

all traffic order total

Statistically identical conversion rate

However, we noticed something interesting: the revenue per visitor was 8% higher for the visitors who saw the quiz, with 99% statistical significance:

Revenue per visitor all traffic

When conversion rate is identical, but revenue per visitor is higher, that means average order value (AOV) increased. And indeed above we see the AOV went from $95 to $101, a 6% increase.

Now, as per the Question Mentality, we don’t just pat ourselves on the back and move on, even when we get a “win” like this.

We continue to ask why.

What we discovered was that when the quiz was designed, for each desired fitness or performance goal that the customer selected, the answers surfaced a “bundle” pack as a product to help them with their goal. The bundle pack had a much higher price than the individual products. So, more users in the quiz variation were buying the bundle, which increased AOV.

What this taught us was less about the quiz and more about the order of products: the bundles were hard for users to find. They were interested in the bundles regardless of whether there was a quiz or not. And in fact, today this client no longer has the quiz on their homepage. It turns out we didn’t need it, we just needed to make the bundles easier to find and organize products by goal and users will make similar choices.

Had we run this test the hypothesis based way: make one hypothesis, measure one goal (conversion rate), report on just that one goal, make a decision. It’s highly likely we would have missed these valuable nuances and left millions in potential revenue on the table for this client.

Testing Quickshop (or Quick Look) on Listing Pages

A second enlightening example contrasting the hypothesis based approach and The Question Mentality is Quick Shop (also called “Quick Look”) .

Think about sites you know that have a “Quick Shop” feature on their listing or collections pages, like this:

pasted image 0 8

It lets you add items to cart directly from the listing page. So for J.Crew, that button opens this modal to let you add to cart:

pasted image 0 11

The customer doesn’t have to click into the product page at all to add items to cart.

Before reading further, stop and answer this question:

Do you think quick shop on these listing pages will help or hurt conversion rate on an ecommerce site? Should ecommerce sites have them or not?

What’s interesting is that there are popular, famous ecommerce sites that both have quick shop and ones that don’t. So it’s not an obvious best practice.

If you answered “Quick shop will for sure help conversion rate!” then it’s interesting to note that these sites don’t have quick shop on their listing pages:

  • Homedepot.com
  • Nike.com
  • Macys.com
  • Awaytravel.com

On the other hand if you answered “No, quick shop is not a good idea!” then it’s interesting to note that these sites do have quick shop features on their listing pages:

  • Jcrew.com
  • Allbirds.com
  • Glossier.com
  • Crateandbarrel.com

Which is right? What do Home Depot and Nike know that Allbirds and Glossier don’t? Or vice versa?

Testing Quick Shop via a Hypothesis Based Method

We tested quick shop for an apparel client with about 1000 products in their store (so browsing on listing pages was an important part of the customer experience).

B Quick Shop

An A/B test hypothesis for this test could be:

We think quick shop will increase conversion rate because it reduces costly page loads for the customer: they won’t have to wait for a product page (PDP) to load!

You would then run the test using either conversion rate (successful checkouts) or, worse, add to carts, as your metric of success. If that metric is higher for the variation with quick shop, the test is called a “winner”. If not, it’s labelled a “loser.”

Testing Quick Shop via The Question Mentality

Now contrast that simplistic view of the AB test with how you’d set it up with the question mentality. You’d frame the test on a series of questions you want answered:

  • How many users use the quick shop features?
  • For those that saw quick shop as an option, how many add to cart via quick shop on the listing page vs. on the PDP?
  • Do people that use the quick shop CTA successfully finish checking out more or less often than those that don’t?
  • Is there a net change in add to carts?
  • Is there a net change in successful checkouts?
  • Do less expensive or simpler items see more quick shop usage than more expensive products where users need to read more to decide?
  • Do returning users convert better because of quickshop than new users because they don’t need to read the PDP as often because they know what they want?
  • Who is the ideal target audience for the quiz?

I could go on, but you get the idea:

Teams using the question mentality to setup this test are:

  • More likely to measure more goals in their ab testing tool, such as add to cart clicks, usage of different quick shop features, etc.
  • More likely to analyze the results by segment (in say, Google Analytics) such as new vs. returning users, traffic source, or those that clicked on the quick shop feature vs. those that didn’t

As a result they’ll get a much more nuanced and thorough understanding of their users than those that just report back to the team: “It won!” or “It lost.” and move on to the next test.

Our Test Results and Questions It Answered

Our test showed almost identical conversion rate (successful checkouts) between the variations with and without quickshop:

pasted image 0 6

Very interesting: across hundred of thousands of visitors, there was absolutely no change in conversion rate by having quick shop.

With the hypothesis method, at this point, many teams would say “Ok, that’s too bad, it doesn’t increase conversion rate.” and move on.

But by asking more questions, we can dig in and understand why a lot better. Most pressing is understanding:

Did it not increase conversion rate because no one used it or did they use it and it still didn’t change anything?

What we found is that having quick shop on the listing page does increase total add to carts, but those extra add to carts don’t lead to orders.

What this suggests is that customers may be using quick shop as just a way to bookmark items they are thinking about. It didn’t affect intent to purchase.

In fact, visits to the cart page were lower for the quickshop variation (albeit without statistical significance). So the quick shop option on the listing page, although it was used, didn’t even get more people to the cart, much less did it get more successful purchases. In fact, if anything not having the quick shop option could possibly result in more visits to the cart.

pasted image 0 1

That’s really interesting.

And it’s just the tip of the iceberg. Like the question list above shows, we can dissect all sorts of nuances and see results by segment to understand how quick shop on listing pages affects user behavior.

How to Implement the Question Mentality With Your AB Testing Team

Changing from hypotheses based testing to the question mentality is not hard is not hard from an operational perspective: You simply introduce your team to this concept (you could just share this article with your team or discuss it at your next call) and have a place to list questions for each test. For example we use Trello to manage client’s AB tests, and we list questions there.

Second, have a step in our process where you review the goals of each test and make sure there are goals being measured that can answer all (or as many as possible) questions you listed.

That’s it from a logistics standpoint.

The key is to get buy in from the team, which, if you explain the arguments carefully as we’ve done here, should not be too difficult.

Using the Question Mentality will open up the ability to understand your customers in far greater detail and get lots of learning and value from every AB test. It will also help prevent the risk of stopping a test too soon and getting dangerously misleading data.

If you want to discuss working with our team to run your CRO and AB testing program, you can learn more about our service or contact us here.

If you have questions, ask away in the comments, we should be able to answer every one.

Leave a Reply