A/B Testing Methodology
Let's start by going into the differences between multi-armed bandit tests and traditional 50/50 split A/B tests so that we're all on the same page. All A/B tests involve pitting two or more versions of a page against each other.
Classic 50/50 Split AB Testing
Let’s say you have a Control and one Variant. In a typical A/B test, your traffic will be split evenly until you turn off the test. If the Control is performing with an 80% conversion rate, but the Variant only has a 20% conversion rate, 50% of your traffic will still be sent to the variant that is performing poorly.
Multi-Armed Bandit Test
In a multi-armed bandit experiment, your goal is to find the most optimal choice or outcome while also minimizing your risk of failure. This is accomplished by presenting a favorable outcome more frequently as a test progresses.
To put this into context, imagine a casino where you're facing a row of slot machines. Your aim is to discover the machine that has the highest chance of winning big while wasting the least amount of money (and time) in the process. Your challenge then is to be able to exploit winning slots while also trying out new machines in the hopes that they'll actually deliver an even higher payout. This is why the Crazy Egg A/B testing algorithm constantly adjusts visitor traffic to minimize losses with losing "machines" so you can quickly see positive improvements in your page performance.
With a multi-armed bandit approach, the conversion rates of your variants are constantly monitored. This is done because an algorithm is used to determine how to split the traffic to maximize your return on investment. The result is that if the Control is performing better, more traffic will be sent to the Control.
Each variant you create for an A/B test will display weight, creation date, number of views, and number of conversions. We look at the number of views, conversions, and creation date to decide the weight (what percentage of visitors) see the variant. These weights are adjusted daily based on the previous cumulative results.
Still, have doubts about the success of Multi-Armed Bandit A/B Testing Methodology?
Ask E-Commerce business Wall Monkey! They increased their conversion rate by 550% after analyzing their Snapshot Reports and testing their theories with our AB Tester. The results don't lie.
The truth of the matter, all AB Test methodologies have sound mathematical backing. It's a matter of preference. And our customers have said, they want a method that is sound and will provide them with an idea of what is working and what is not quickly. Multi-Armed Bandit AB Testing does this.
And, here is how we do this...
Crazy Egg A/B Test Results Dashboard
Let me orient you to your A/B Test Results Dashboard. The first thing to note is that while combined all in one table, the Total Traffic column is not meant to be compared to the other data columns. The purpose of this column is to show you the future division of visitor traffic for each Variant. This is not the volume of traffic that any of the page versions have received so far.
When you add new variants, the total traffic percentage will change. To explore and exploit the new variant, the system will give more traffic to the new variant to test it equally or fairly against the other variants that have already been running.
Depending on the actions of your visitors, here are the possibilities that will come up:
- If one version starts to show it is doing better, the percentage of traffic will swing (change) as the weight is updated favorably.
- If after a period of time visitors stop interacting or converting on the page, the traffic percentage will swing again, this time unfavorably.
- If you leave your test running long enough a clear winner will show. The length of time will vary depending on traffic volume to your page.
What Happens When a Variant is Retired?
If you choose to retire a variant that is performing badly, the system will re-calculate the weights on the control and other variants running. This will change the total traffic percentage, placing a higher percentage towards those performing poorly thus far so as to determine how the pages do in comparison to only each other.