A/B tests are the golden standard of causal inference because they allow us to make valid causal statements under minimal assumptions, thanks to randomization. In fact, by randomly assigning a treatment (a drug, ad, product, …), we can compare the outcome of interest (a disease, firm revenue, customer satisfaction, …) across subjects (patients, users, customers, …) and attribute the average difference in outcomes to the causal effect of the treatment.
The implementation of an A/B test is usually not instantaneous, especially in online settings. Often users are treated live or in batches. In these settings, one can look at the data before the data collection is completed, one or multiple times. This phenomenon is called peeking. While looking is not problematic in itself, using standard testing procedures when peeking can lead to misleading conclusions.
The solution to peeking is to adjust the testing procedure accordingly. The most famous and traditional approach is the so-called Sequential Probability Ratio Test (SPRT), which dates back to the Second World War. If you want to know more about the test and its fascinating history, I wrote a blog post about it.
The main advantage of the Sequential Probability Ratio Test (SPRT) is that it guarantees the smallest possible sample size, given a target confidence level and power. However, the main problem with the SPRT is that it might continue indefinitely. This is a non-irrelevant problem in an applied setting with deadlines and budget constraints. In this article, we will explore an alternative method that allows any amount of intermediate peeks at the data, at any point of the data collection: Group Sequential Testing.
Let’s start with some simulated data. To keep the code as light as possible, I will abstract away from the experimental setting…