One of my references in the Data Science field is Julia Silge. On her *Tidy Tuesday* videos she always makes a code-along type of video teaching/ showing a given technique, helping other analysts to upskill and incorporate that to their repertoire.

Last Tuesday, the topic was Empirical Bayes (her blog post), which caught my attention.

But, what is that?

Empirical Bayes is a statistical method used when we work with ratios like *[success]/[total tries]*. When we are working with such variables, many are the times when we face a 1/2 success, which translates to a 50% success percentage, or 3/4 (75%), 0/1 (0%).

Those extreme percentages do not represent the long term reality because there were so little tries that it makes it very hard to tell if there is a trend there, and most times these cases are just ignored or deleted. It takes more tries to tell what the real success rate is, like 30/60, 500/100, or whatever makes sense for a business.

Using Empirical Bayes, though, we are able to use the current data distribution to calculate an estimate for its own data in earlier or later stages, as we will see next in this post.

We use the data distribution to estimate earlier and later stages of each observation’s ratio.

Let’s jumps to the analysis. The steps to follow are:

- Load the data
- Define success and calculate the success ratio
- Determine the distribution’s parameters
- Calculate Bayes estimates
- Calculate the Credible Interval

Let’s move on.

**Imports**

`# Imports`

import pandas as pd

import numpy as np

import scipy.stats as scs

import matplotlib.pyplot as plt

import seaborn as sns

import plotly.express as px

from distfit import distfit