Beyond Churn Prediction and Churn Uplift | by Matteo Courthoud | Jul, 2023

In this section, we will try to understand if there is a data-informed profitable way to send the gift, by targeting specific customers. In particular, we will compare different targeting policies with the objective of increasing revenue.

Throughout this section, we will need some algorithms to either predict revenue, or churn, or the probability of receiving the gift. We use gradient-boosted tree models from the lightgbm library. We use the same models for all policies so that we cannot attribute differences in performance to prediction accuracy.

from lightgbm import LGBMClassifier, LGBMRegressor

To evaluate each policy denoted with τ, we compare its profits with the policy Π⁽¹⁾, with its profits without the policy Π⁽⁰⁾, over every single individual, in a separate validation dataset. Note that this is usually not possible since, for each customer, we only observe one of the two potential outcomes, with or without the gift. However, since we are working with synthetic data, we can do oracle evaluation. If you want to know more about how to evaluate uplift models with real data, I recommend my introductory article.

First of all, let’s define profits Π as the net revenue R when the customer does not churn C.

Profit formula, image by Author

Therefore, the overall effect on profits for treated individuals is given by the difference between the profits when treated Π⁽¹⁾ minus the profits when not treated Π⁽⁰⁾.

Profit lift formula, image by author

The effect for untreated individuals is zero.

def evaluate_policy(policy):
data = dgp.generate_data(seed_data=4, seed_assignment=5, keep_po=True)
data['profits'] = (1 - data.churn) * data.revenue
baseline = (1-data.churn_c) * data.revenue_c
effect = policy(data) * (1-data.churn_t) * (data.revenue_t-cost) + (1-policy(data)) * (1-data.churn_c) * data.revenue_c
return np.sum(effect - baseline)

1. Target Churning Customers

A first policy could be to just target churning customers. Let’s say we send the gift only to customers with above-average predicted churn.

model_churn = LGBMClassifier().fit(X=df[X], y=df['churn'])

policy_churn = lambda df : (model_churn.predict_proba(df[X])[:,1] > df.churn.mean())


The policy is not profitable and would lead to an aggregate loss of more than 5000$.

You might think that the problem is the arbitrary threshold, but this is not the case. Below I plot the aggregate effect for all possible policy thresholds.

x = np.linspace(0, 1, 100)
y = [evaluate_policy(lambda df : (model_churn.predict_proba(df[X])[:,1] > p)) for pin x]

fig, ax = plt.subplots(figsize=(10, 3))
sns.lineplot(x=x, y=y).set(xlabel='Churn Policy Threshold', title='Aggregate Effect');
ax.axhline(y=0, c='k', lw=3, ls='--');

Aggregate effect by churn threshold, image by Author

As we can see, no matter the threshold, it is basically impossible to make any profit.

The problem is that the fact that a customer is likely to churn does not imply that the gift will have any impact on their churn probability. The two measures are not completely unrelated (e.g. we cannot decrease the churning probability of customers that have a 0% probability of churning), but they are not the same thing.

2. Target revenue customers

Let’s now try a different policy: we send the gift only to high-revenue customers. For example, we might send the gift only to the top 10% of customers by revenue. The idea is that if the policy indeed decreases churn, these are the customers for whom decreasing churn is more profitable.

model_revenue = LGBMRegressor().fit(X=df[X], y=df['revenue'])

policy_revenue = lambda df : (model_revenue.predict(df[X]) > np.quantile(df.revenue, 0.9))


The policy is again unprofitable, leading to substantial losses. As before, this is not a problem of selecting the threshold, as we can see in the plot below. The best we can do is set a threshold so high that we do not treat anyone, and we make zero profits.

x = np.linspace(0, 100, 100)
y = [evaluate_policy(lambda df : (model_revenue.predict(df[X]) > c)) for c in x]

fig, ax = plt.subplots(figsize=(10, 3))
sns.lineplot(x=x, y=y).set(xlabel='Revenue Policy Threshold', title='Aggregate Effect');
ax.axhline(y=0, c='k', lw=3, ls='--');

Aggregate effect by revenue threshold, image by Author

The problem is that, in our setting, the churn probability of high-revenue customers does not decrease enough to make the gift profitable. This is also partially due to the fact, often observed in reality, that high-revenue customers are also the least likely to churn, to begin with.

Let’s now consider a more relevant set of policies: policies based on uplift.

3. Target churn uplift customers

A more sensible approach would be to target customers whose churn probability decreases the most when receiving the 1$ gift. We estimate churn uplift using the double-robust estimator, one of the best-performing uplift models. If you are unfamiliar with meta-learners, I recommend starting from my introductory article.

We import the doubly-robust learner from econml, a Microsoft library.

from econml.dr import DRLearner

DR_learner_churn = DRLearner(model_regression=LGBMRegressor(), model_propensity=LGBMClassifier(), model_final=LGBMRegressor())['churn'], df[W], X=df[X]);

Now that we have estimated churn uplift, we might be tempted to just target customers with a high negative uplift (negative, since we want to decrease churn). For example, we might send the gift to all customers with an estimated uplift larger than the average churn.

policy_churn_lift = lambda df : DR_learner_churn.effect(df[X]) < - np.mean(df.churn)

The policy is still unprofitable, leading to almost 4000$ in losses.

The problem is that we haven’t considered the cost of the policy. In fact, decreasing the churn probability is only profitable for high-revenue customers. Take the extreme case: avoiding churn of a customer that does not generate any revenue is not worth any intervention.

Therefore, let’s only send the gift to customers whose churn probability weighted by revenue decreases more than the cost of the gift.

model_revenue_1 = LGBMRegressor().fit(X=df.loc[df[W] == 1, X], y=df.loc[df[W] == 1, 'revenue'])

policy_churn_lift = lambda df : - DR_learner_churn.effect(df[X]) * model_revenue_1.predict(df[X]) > cost


This policy is finally profitable!

However, we still have not considered one channel: the intervention might also affect the revenue of existing customers.

4. Target revenue uplift customers

A symmetric approach to the previous one would be to consider only the impact on revenue, ignoring the impact on churn. We could estimate the revenue uplift for non-churning customers and treat only customers whose incremental effect on revenue, net of churn, is greater than the cost of the gift.

DR_learner_netrevenue = DRLearner(model_regression=LGBMRegressor(), model_propensity=LGBMClassifier(), model_final=LGBMRegressor())[df.churn==0, 'revenue'], df.loc[df.churn==0, W], X=df.loc[df.churn==0, X]);
model_churn_1 = LGBMClassifier().fit(X=df.loc[df[W] == 1, X], y=df.loc[df[W] == 1, 'churn'])

policy_netrevenue_lift = lambda df : DR_learner_netrevenue.effect(df[X]) * (1-model_churn_1.predict(df[X])) > cost


This policy is profitable as well but ignores the effect on churn. How do we combine this policy with the previous one?

5. Target revenue uplift customers

The best way to efficiently combine both the effect on churn and the effect on net revenue is simply to estimate total revenue uplift. The implied optimal policy is to treat customers whose total revenue uplift is greater than the cost of the gift.

DR_learner_revenue = DRLearner(model_regression=LGBMRegressor(), model_propensity=LGBMClassifier(), model_final=LGBMRegressor())['revenue'], df[W], X=df[X]);

policy_revenue_lift = lambda df : (DR_learner_revenue.effect(df[X]) > cost)


It looks like this is by far the best policy, generating an aggregate profit of more than 2000$!

The result is starking if we compare all the different policies.

policies = [policy_churn, policy_revenue, policy_churn_lift, policy_netrevenue_lift, policy_revenue_lift] 
df_results = pd.DataFrame()
df_results['policy'] = ['churn', 'revenue', 'churn_L', 'netrevenue_L', 'revenue_L']
df_results['value'] = [evaluate_policy(policy) for policy in policies]

fig, ax = plt.subplots()
sns.barplot(df_results, x='policy', y='value').set(title='Overall Incremental Effect')
plt.axhline(0, c='k');

Comparing policies, image by Author

Intuition and Decomposition

If we compare the different policies, it is clear that targeting high-revenue or high-churn probability customers directly were the worst choices. This is not necessarily always the case, but it happened in our simulated data because of two facts that are also common in many real scenarios:

  1. Revenue and churn probability are negatively correlated
  2. The effect of the gift on churn (or revenue) was not strongly negatively (or positively for revenue) correlated with the baseline values

Either one of these two facts can be enough to make targeting revenue or churn a bad strategy. What one should target instead is customers with a high incremental effect. And it’s best to directly use as outcome the variable of interest, revenue in this case, whenever available.

To better understand the mechanism, we can decompose the aggregate effect of a policy on profits into three parts.

Profit lift decomposition, image by Author

This implies that there are three channels that make treating a customer profitable.

  1. If it’s a high-revenue customer and the treatment decreases its churn probability
  2. If it’s a non-churning customer and the treatment increases its revenue
  3. It the treatment has a strong impact on both its revenue and churn probability

Targeting by churn uplift exploits only the first channel, targeting by net revenue uplift exploits only the second channel, and targeting by total revenue uplift exploits all three channels, making it the most effective method.

Bonus: weighting

As highlighted by Lemmens, Gupta (2020), sometimes it might be worth weighting observations when estimating model uplift. In particular, it might be worth giving more weight to observations close to the treatment policy threshold.

The idea is that weighting generally decreases the efficiency of the estimator. However, we are not interested in having correct estimates for all the observations, but rather we are interested in estimating the policy threshold correctly. In fact, whether you estimate a net profit of 1$ or 1000$ it does not matter: the implied policy is the same: send the gift. However, estimating a net profit of 1$ rather than -1$ reverses the policy implications. Therefore, a large loss in accuracy away from the threshold sometimes is worth a small gain in accuracy at the threshold.

Let’s try using negative exponential weights, decreasing in distance from the threshold.

DR_learner_revenue_w = DRLearner(model_regression=LGBMRegressor(), model_propensity=LGBMClassifier(), model_final=LGBMRegressor())
w = np.exp(1 + np.abs(DR_learner_revenue.effect(df[X]) - cost))['revenue'], df[W], X=df[X], sample_weight=w);

policy_revenue_lift_w = lambda df : (DR_learner_revenue_w.effect(df[X]) > cost)


In our case, weighting is not worth it: the implied policy is still profitable but less than the one obtained with the unweighted model, 2028$.

Source link

Leave a Comment