Why Your Brain Is Your Worst Enemy When A/B Testing

June 14, 2016

Did you know that we, humans, SUCK at statistical reasoning? We’re also irrational, flawed, and subjective. Why? Because we’re influenced by a list of cognitive biases.

You can perfectly live (but biased) without knowing about them, but if you’re here, it means you’re A/B testing or contemplating to start so. A/B Testing is a science experiment which must by definition be objective to provide actionable data. Cognitive biases are then a real threat. To get that out of the way, cognitive biases are personal opinions, beliefs, preferences that influence your ability to reason, remember, evaluating information. Let’s go down the rabbit-brain (sorry, had to do it) and make sure we’re not subjectively influencing our tests too much by being our flawed (but lovable) selves.

This article is the last in our series on A/B Testing Mistakes.

1 Finding relations between unrelated events

Remember how we talked about external validity threats?

Well, if you didn’t know about them, you could assume that the lift you see was indeed caused by the put a pink CTA in your variation. Not because there is a storm coming that scared people into buying your product for example. You’d have been victim of the illusory correlation bias. You perceived a relationship between 2 unrelated events. Penguin giving a syllogism to illustrate A/B testing bias

Penguin giving a syllogism to illustrate A/B testing bias

Why an A/B Test worked or not isn’t straightforward. Be careful not to rush your test analysis. Our brain jumps to conclusions like there is no tomorrow. (A great book on the subject is “Thinking Fast and Slow”, by Daniel Kahneman.) Your results are what you’ll use to take business decisions, so don’t rush your analysis.

2 Can’t handle sample size

When we talked about fixing a sample size before A/B testing, we actually were also partially preventing another bias called insensitivity to sample size. Our brain struggles to apprehend correctly sample size and underestimates variations in small samples. Example from D. Kahneman’s book: A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50% of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50%, sometimes lower. For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days?

The larger hospital
The smaller hospital
About the same (that is, within 5% of each other)

Care to make a guess? Here’s what the answers looked like in the study: “56% of subjects chose option 3, and 22% of subjects respectively chose options 1 or 2. However, according to sampling theory the larger hospital is much more likely to report a sex ratio close to 50% on a given day than the smaller hospital which requires that the correct answer to the question is the smaller hospital.” Sample size is capital, and our brain usually forgets about it when considering problems similar to the example above. Don’t draw any conclusions from small samples as results won’t have any statistical value. They can spark ideas, discussions and be the basis for actual tests, so you can verify the data. You’re warned of the risks now, so think hard before making a decision based off a small sample.

3 Looking for and Interpreting information to confirm your own beliefs

Confirmation bias is also not to be ignored. It’s the fact that you will seek, interpret or focus on information (subconsciously or not) that confirm your beliefs. Dilbert comic on the confirmation bias

Add to that the Congruence bias, where you test what YOU think is the problem, rather than the rest and you got yourself a nice, subjective test. For example, if you think the color red does indeed increase conversions, your brain will look for any information to confirm this belief. A test could barely go in your direction (not in a statistically significant way), you’ll be way more inclined to call it a success than if your convictions weren’t on the line. Every time you feel you were right, and that data seem to go your way, it’s time to pause and ask yourself:

Does it really prove your hypothesis in an objective way?
Did you push this idea with your ego on the line to begin with?
Aren’t there other factors that could have produced (or at least considerably helped) this lift?

If you’re A/B testing to prove you’re right and/or you value ideas themselves, you’re doing it wrong. You’re testing to learn, not building your ego. Impact is what’s important.

4 Seeing patterns when there are none and thinking past events influence future probabilities

Let’s tackle two biases at once. The clustering illusion: the intuition that random events which occur in clusters are not really random events. A fun story illustrating the clustering illusion is the one about the Texas Sharpshooter. It’s the story of a Texan who shoots on the blank wall of his barn then draws a target centered where his shots are most clustered. And then he proceeds to brag about his shooting skills. comic illustrating the clustering illusion a/b testing

comic illustrating the clustering illusion a/b testing

Because you see similarities doesn’t mean there is a pattern. Nor because you made some good guesses in the past mean you’ll keep making them. Flipping a coin 10 times and getting 7 tails doesn’t necessarily means the coin is biased. It just means you got tails 7 times in a row. Okay, let’s keep flipping coins. Let’s say we flip another coin 39 times, and get 39 heads in a row. What is the probability of having heads again for the 40th flip? 50%. Just as any other coin flipping. If you were a bit confused, you fell prey to the gambler (or hot hand) fallacy. You thought that because you got heads so many times in a row it would somehow influence the probability of the last throw. Don’t stop a test because “you see a pattern” or “spot a trend”. Think about those 2 biases, odds are overwhelmingly in the favor of what you think you see is random. Maybe you got some good results based on your intuition. Or so you think. You could actually be in the same position as our Texan shooter. And because you’ve been right twice about something before, doesn’t mean you will be next time. Only data, obtained through rigorous tests, will tell.

5 Thinking what’s in front of him is everything he needs to draw conclusions

This is what D. Kahneman called “what you see is all there is” in his book. It’s the notion that we draw conclusions based on information available to us, i.e. in front of our eyes. Doesn’t sound too bad, uh? Let’s try with this: a bat and a ball together cost $1.10. The bat cost $1.00 more than the ball. How much does the ball cost? bat and ball problem

50% of the students who were asked this simple question, students attending either Harvard or Yale got this wrong. 80% of the students who were asked this question from other universities got it wrong. I’ll let you find out the answer on your own. (And no it doesn’t cost 0.10$.) Your brain is wired to look for patterns and drawing conclusions with what you have. Except he sometimes jumps the gun. Because you’ve got 2 pieces of data under your nose, doesn’t mean they’re all you need to draw a sensible conclusion.

6 Basing all subsequent thinking on the first piece of information received

Called the Anchoring bias, it’s the fact that we allocate more importance to the first piece of information we’re given. Here is an example from a study by Fritz Strack and Thomas Mussweiler: 2 groups of people were asked about Gandhi’s age when he died. The first group was asked if it was before 9 years old or after. The second if it was before 140 or after. Both answers are pretty obvious. But what was very interesting, were the answers from both groups when they asked to guess Gandhi’s actual age when he died. Answers from the first group had an average of 50 vs 67 for the second. Why such a difference? Because they were subconsciously influenced by their respective first questions. Here’s a picture illustrating a similar study: Comic on the anchoring effect

Depending whether the last digits of their social security number was high or low, the two groups were influenced as illustrated when they were then asked to guess the price of random objects, here a bottle of French Wine. Think about your last salary negotiation for a job interview. The first person to give a number basically calibrate the rest of the negotiation, because everything following will be based off that number, for the starting value and the scale of the negotiation. When the number given is precise, people tend to negotiate in smaller increments than with round numbers. If the interviewer went first, with his highest bid—or what he said was his highest bid, I’d wager you used it as a base with your counter-offer instead of what you thought you were worth. By now you must be getting weirdly suspicious of your own brain. Good. Being aware that we’re built to jump to conclusions, consider only what’s in front of our eyes and ignore the big picture is the first step in the right direction. Be extra-careful with your numbers and tests! When you feel you’re sure about a result, pause and check again. Run the test a second time if needed.

7 Throwing reason out the window when ego and emotion are involved

This one can be painful. You put your heart and soul in a redesign, you spend hours on it and you’re super-proud of what you made. Then you test it. And it flops. Badly. Ouch … What do you do? “Screw these people my design is perfect, they don’t know what they’re talking about!” Or when you bring the news to your boss he says: “No way, this design is clearly better, go with this one.” No! Be strong, I know your pain. It’s hard but that’s one of the reasons you’re A/B Testing, to not lose money on redesigns or decisions based on guts and personal opinions. But rather do things people actually want. Swallow your frustration, go back to the hypothesis that led to this redesign, aaand to the drawing board again. Being able to throw out hours —days even, of work through the window if your data say so, is a sign you’re truly becoming data-driven. It’s freakishly hard though.

8 Preventing you from thinking like your customers

Called the curse of knowledge, it’s when you’ve been so absorbed by a subject that you’re having a hard time thinking about problems like someone who has little to no knowledge about it. comic illustrating the curse of knowledge for a/b testing

comic illustrating the curse of knowledge for a/b testing

When you know something is there—say a new button or a new picture, that’s all you see on the page. But your visitors could just as well not even see the difference. Ask someone else, a colleague from another team to take a look. Or you could do usability tests. Don’t ask someone from your team, though. You could all be victims of the bandwagon effect. Members of a group influence each other. And the more people do something, the more other people might be influenced to do the same. If you don’t regularly receive external feedback, you might have built yourself a distorted reality. Regularly ask your visitors and clients, as well as other teams for feedback.

9 Taking things for granted because you’ve always done them this way

Functional fixedness is when you’re stuck in linear thinking. You see an iron, you only think about its obvious use—for clothes (linear thinking), you don’t think to use it as a toaster (lateral thinking) if you don’t have an oven (or a real toaster). This is called lateral thinking, or “thinking out of the box”. example of lateral thinking

Easier said than done, though. You can try to trigger this type of thinking by repeatedly asking “why" every time you think something is obvious. That way you’ll dive in your assumption till you get to its bottom—or source. What you’ll find might blow mind, sideways. Other things you can do to try and trigger lateral thinking:

Turn your problem on its head, try to solve the opposite
Think about the most obvious, stupid solution
Fragment your problem in series of small, very precise problems
Don’t be satisfied with finding one solution
Flip your perspective, how would you think about this problem if you were an engineer, a scientist, a complete beginner?

10 Overestimating the degree at which people agree with you

“Everyone hates popups.” Well, YOU hate them. They usually annoy people a bit but they actually convert quite nicely when used the right way (i.e they don’t pop in your face the second you arrive on a website). This type of bias (aka the false consensus effect) can be tricky when gathering feedback. We can sometimes believe strongly in something, and think we’re on the side of the majority when we’re really not. We tend to assume people have the same opinions we do. It’s better to get feedback from individuals rather than from a group. Group feedback will be plagued by biases. But when asking individuals, don’t get caught up in personal preferences either. Always take everything with a grain of salt. Be careful not to do it yourself too! When you think that something is doing as good as it could be, stop in your tracks and reconsider. Test what would have the most impact, but also test what is doing fine. You can always do better. This concludes our article on cognitive biases and our series on A/B Testing Mistakes. It was quite a ride! Riddled with self-doubts, highs, lows … as it was the first time I ever wrote this type of content (length and topic). See you next time :-)

Oh, and don't forget, you can get the full series about A/B Testing in ebook form.

PS: Before you go, a couple of things I’d like you to do:

If this article was helpful in any way, please let met know. Either leave a comment, or hit me up on Twitter @kameleoonrocks.
If you’d like me to cover a topic in particular or have a question, same thing: reach out.

It’s extremely important for me to know that I write content both helpful and focused on what matters to you. I know it sounds cheesy and fake, but I feel like writing purely marketing, “empty” stuff—just for the exercise, is a fat loss of time for everyone. Useless for you, and extremely un-enjoyable for me to write. So let’s work together! PS2: If you missed the 4 previous articles in our series, here they are:

Topics covered by this article