Causal Inference (A/B Testing) in Real Project Using R (Part I)

3 min readJun 15, 2021

Background

A retail company with an online website has multiple marketing channels, including email, TV, and social media platform. The spending on advertising accounted for 50% of the overall budget. As a result, optimizing the Return On Investment (ROI) of investing advertisements in each campaign would be the priority. The company started an email campaign, which sends two kinds of emails, one featuring Men’s merchandise, the other promoting Women's merchandise. They tracked the customers’ behaviors during the campaign such as visiting the website through the link in email, making a purchase, and their spendings. The dataset contains information on customers who have purchased in the last 12 months. With the dataset, we are aiming at evaluating the effectiveness of the email campaign.

Key Question

How does the email campaign affect visits, conversion, and spending?

{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

Data Cleaning

Load packages

library(dplyr)
install.packages(‘grf’)
library(grf)
library(tidyverse)
library(ggplot2)

Load dataset

data = read.csv(“email.csv”)

Separate dataset into three groups

men<-data%>%filter(segment==’Mens E-Mail’)
women<-data%>%filter(segment==’Womens E-Mail’)
no<-data%>%filter(segment==’No E-Mail’)

Data Description

Basic information about the dataset

The dataset can be downloaded from https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.htm.

It contains information about an email campaign of a company.

Data Exploration

Look at what the dataset looks like

head(data)

Each unit of observation is a customer. The treatment variable is segmented while the target variables are visit, conversion, and spend. The dataset also contains some other features of customers:

*Recency*: Months since last purchase.

*History_Segment*: Categorization of dollars spent in the past year.

*History*: Actual dollar value spent in the past year.

*Mens*: 1/0 indicator, 1 = customer purchased Mens merchandise in the past year.

*Womens*: 1/0 indicator, 1 = customer purchased Womens merchandise in the past year.

*Zip_Code*: Classifies zip code as Urban, Suburban, or Rural.

*Newbie*: 1/0 indicator, 1 = New customer in the past twelve months.

*Channel*: Describes the channels the customer purchased from in the past year.

nrow(data)

There are 64000 observations in the dataset.

Look at the summary statistics and distributions of variables

hist(data$history)

The distribution of historical spending of customers is right-skewed.

ggplot(data, aes(x=zip_code)) + geom_bar()

We can see that most of the customers are in urban or suburban areas.

data %>% dplyr::group_by(visit) %>% dplyr::summarise(count=dplyr::n())
data %>% dplyr::group_by(conversion) %>% dplyr::summarise(count=dplyr::n())
data_spend <- data %>% filter(spend!=0)
summary(data_spend$spend)

We can see that there are significantly more people who didn’t visit and convert. As for spending, for customers who purchased, the average spending is $116.36.

Experimental Design

Randomization Check

As the space is limited, I only show a randomization check for one variable.

summary(aov(mens ~ segment, data = data))
summary(aov(womens ~ segment, data = data))
summary(aov(recency ~ segment, data = data))
summary(aov(history ~ segment, data = data))
rand5 <- aov(newbie ~ segment, data = data)
tbl <- table(data$channel, data$segment)
chisq.test(tbl)

As there is no significant difference between any groups of any variable, we can conclude that randomization was properly conducted.

data %>% dplyr::group_by(segment) %>% dplyr::summarise(count=dplyr::n())

There are three groups of customers: a group that didn’t receive an email received emails advertising on men's products and received emails advertising on women's products. The smallest group is 21306. Let’s look at the power of the experiment.

Power of the experiment

power.t.test(n =21306, sig.level = 0.1, power = 0.8)

We can reasonably detect a change no smaller than 0.02.

Threats to Causal Inference

After examining the empirical context and the sample data we used, we are faced with one of the endogeneity factors, *Sampling Error*. The 64,000 customers may not be representative of the whole customer population, so our conclusions may not be applicable to the whole customer base.

Other concerns of the experiment:

Interference:

In order to evaluate whether women’s and men’s campaigns are successful, we perform statistical tests to confirm whether customer visits, conversions, or spendings on the website are affected by men’s campaigns or women’s campaigns.