When making updates to your website, it’s hard to know whether or not a change caused an improvement in your KPIs. An A/B Test provides a statistical approach to validate any findings.

I got the opportunity to try out an A/B testing analysis using some fabricated traffic data, along with a hypothesis to check. It’s important to know that there is a ton of front end work to do this right. These experiments needs to be designed, particularly with respect to the sample size needed and effect size.

For this example, I’m lucky enough to have an experiment that was designed and completed before I can access to the data. The practice is focused on the statistical analysis. Check out this account from AirBnB that shows the importance of design and context when tackling these types of problems.

The below covers the presented hypothesis, background information, a full statistical analysis, and next-step suggestions.

Packages

All of the data was in an Excel spreadsheet, so I used readxl to import. For wrangling, I brought in dplyr via the tidyverse package. knitr and kableExtra help for making pretty tables for reports.

knitr::opts_chunk$set(message = FALSE, warning = FALSE)

library(tidyverse)
library(readxl)
library(knitr)
library(kableExtra)

Background - Presented Hypothesis

I believe that presenting the user with large category buttons on the mobile homepage will allow users to more easily navigate to product. If I’m right, we’ll see an increase in product views and downstream metrics including order conversion.

We’ve got data on the control site and 2 variants over a 16 day period.

Version Description
A control, standard content slot (no category links)
B smaller horizontal text-based links on a black background, ‘shop by category’ is displayed across the top
C larger vertical text links overlaid on product images in the background

Based on the hypothesis, we can go ahead with an A/B test comparing the control to variant C, since it contains larger buttons. The tested metric will be product viewing rate, or ‘product site visits’ / ‘total home page visits’.

Some additional color on the key metrics recorded:

Key.Metric Definition
Visits the number of visits to the site
Bounces a visit who leaves the site without interacting with the page
Category_Link_Click_Visits a visit who clicks a test category link (men’s, women’s, women’s shoe, men’s shoes, women’s boots, men’s boots)
Product_View_Visit a visit who saw a product page
Cart_Visit a visit who saw the cart page
Orders a visit who made a purchase
Revenue sales during the period

Formal Hypotheses

H0: The product viewing rate is the same for website variants A and C.

HA: The larger buttons in variant C will increase product views.


With this background information set, we can go ahead with our workflow.

Reading in data, extracting KPIs

Getting a look at the table, we see that most of the columns are numeric. When importing, I factored the version variable.

ab_data <- read_excel("AB_test.xlsx", sheet = "key_metrics") %>%
  mutate(Version = factor(Version))

ab_data
## # A tibble: 48 x 9
##      Day Version Homepage_Visits Bounces Category_Link_C~ Product_View_Vi~
##    <dbl> <fct>             <dbl>   <dbl>            <dbl>            <dbl>
##  1     1 A                  1260     214                0              652
##  2     2 A                  1342     194                0              762
##  3     3 A                  1351     204                0              739
##  4     4 A                  1323     212                0              668
##  5     5 A                  1688     265                0              896
##  6     6 A                  1950     277                0             1100
##  7     7 A                  1257     222                0              649
##  8     8 A                  1239     195                0              645
##  9     9 A                   891     146                0              478
## 10    10 A                   869     134                0              467
## # ... with 38 more rows, and 3 more variables: Cart_Visits <dbl>,
## #   Orders <dbl>, Revenue <dbl>

For calculations, the 2 versions were separated, summed, with key conversion metrics calculated, which is the rate of users who entered the website that ended up somewhere down the purchasing funnel.

# Calculate total homepage visits, product views, 
# and conversion rate for control and large button variant.
control <- ab_data %>%
  filter(Version == "A") %>%
  summarize(homepage_visits = sum(Homepage_Visits),
            product_view_visits = sum(Product_View_Visit),
            cart_visit = sum(Cart_Visits),
            orders = sum(Orders)) %>%
  mutate(prod_rate = product_view_visits / homepage_visits,
         cart_rate = cart_visit / homepage_visits,
         order_rate = orders / homepage_visits)

variant <- ab_data %>%
  filter(Version == "C") %>%
  summarize(homepage_visits = sum(Homepage_Visits),
            product_view_visits = sum(Product_View_Visit),
            cart_visit = sum(Cart_Visits),
            orders = sum(Orders)) %>%
  mutate(prod_rate = product_view_visits / homepage_visits,
         cart_rate = cart_visit / homepage_visits,
         order_rate = orders / homepage_visits)



Statistic Calculation

The test statistic is the difference between the two conversion rates. Again, we want to see if the change in user experience generated a change in the chance that user makes it to a product page. Note that I have the calculation assignments wrapped in parentheses - doing this assigns your variable and prints it!

(test_stat <- variant$prod_rate - control$prod_rate)
## [1] 0.03781852

Calculate pooled prop, standard error, z statistic, and p-value. Thanks to rcuevass for the great refresher in the statistics!

(pooled_prop <- (control$product_view_visits + variant$product_view_visits) / 
  (control$homepage_visits + variant$homepage_visits))
## [1] 0.5560487
(std_err <- sqrt(pooled_prop*(1 - pooled_prop) *
                   (1/control$homepage_visits + 1/variant$homepage_visits)))
## [1] 0.004907679
(z_stat <- test_stat/std_err)
## [1] 7.705988
(p_value <- pnorm(z_stat, lower.tail = FALSE))
## [1] 6.491719e-15



Conclusions

The larger buttons produced a 370 basis point lift in the product viewing rate. The calculated p-value satisfies a very stringent alpha level, which lines up with a difference this large. I would recommend adopting the design with large category buttons.

As for downstream metrics, my initial reaction is to build different tests to move the consumer along the conversion funnel. Said differently, what can we do to get the visitor from the product view page to the Product Description Page or the cart? The effect of the category buttons on the end conversion rate is something I’m interested in: does an experience earlier in the order funnel affect the rates a few steps down into the funnel?

However, for completeness, see below similarly calculated p-values for the difference in cart viewing and order conversion rates.


For cart views:

(p_value <- pnorm(z_stat, lower.tail = FALSE))
## [1] 0.8487188

For orders:

(p_value <- pnorm(z_stat, lower.tail = FALSE))
## [1] 0.6132222

These larger p-values imply that any difference between the cart viewing and conversion rates for the control and the variant are most likely due to chance.


More Investigations, based on provided data

  • Large buttons vs small buttons by using category click links metric.
  • Dig into which of the buttons performed stronger - maybe men’s boots had higher click through - why?
  • Create a test for more specific category links like “Camp”, “Run”, and “Trail.”