Retain your customers with AI

Retain your customers with AI

What is churn?

Customer churn is a well known phenomenon in many businesses. In fact, in virtually every single one that you may imagine, since every business has customers and sells “stuff” to them. What is sold is irrelevant for the fact that the customer is always present. Even if the business is B2B, there are customers (only this time they are other businesses). Thus, customer is always the cornerstone of every business and losing them is terrible no matter what we sell.

Luckily for us, losing customers does not happen very often. Depending on the business, it may happen in a ratio of 1 to 4 (a high customer loss ratio) or maybe 1 to 20 for loyal-customer-based businesses. It always depends on the kind of business. In any case, reducing this ratios will be our goal. However, defining what is an acceptable ratio for a business is not easy, nor reusable across business cases. What ratio would be acceptable for a telecommunications company? What for a gym? What for a supermarket? In the latter case, is it even possible to measure churn?

Depending on the context, the very first question we must answer when dealing with churn is: what is churn IN OUR BUSINESS SCENARIO? This can be viewed as an extension of the classic problem definition phase in any Machine Learning project, but with a twist. And, as in any Machine Learning project, monetizing and valuing our model is key to understand how good it really is beyond statistical (and, let’s be honest, not useful for the real goal of the model, which is to be applied to a business) measures. Translating models into money is never easy, and churn is no exception, but churn is a widely studied case across industries, and from them we can start having an idea of how important it is (taken from here):

  • Increasing customer retention rates by 5% increases profits by 25% to 95%
  • It costs five times as much to attract a new customer than to keep an existing one

Of course, feel free to adjust these numbers and / or apply them to your own business case, but these give us a mind-blowing perception of the business critical role of a model like this. It helps to rapidly realize about the profitability that derive from the changes in our model (increasing 1% in a performance measure in the model impacts in X money for our business).

Once we have established what is churn for our particular business case, we always will want to know not just what, but, at least, this key points:

  • Who?
    • Knowing which customers are more likely to leave we can design retention campaigns and avoid their abandon.
  • Why?
    • These retention campaigns have to be focused on key areas of our business. Knowing the reasons that the customers who are likely to leave share will help us targeting customers profiles and designing actions.

In balance there is virtue

As the philosopher said, in balance there is virtue. This is true in life and is no different in Machine Learning, especially in a case like churn. We have already mentioned that churn ratios can be higher or lower depending on the business case, but it will be always unbalanced in favor of the ‘not churned’ cases. This is a well known situation in Machine Learning, and it makes harder to our models (in this case, a binary classifier since we have 2 classes -‘churned’ and ‘not churned’) to learn from the data since less cases of the ‘churned’ class are shown. There are diverse techniques to balance our data and make the learning task easier to the algorithms.

We could apply

  • Oversampling: Increasing the minority class (our ‘churned’ cases) creating synthetic cases that are similar enough to the real cases.
  • Undersampling: Decreasing the majority class. Balancing the dataset can be achieved eliminating cases from the majority class that are not significant because other extremely or exactly identical cases already exist.
  • Hybrid methods: Combining oversampling and undersampling methods.

There are other techniques associated with the algorithm that we are building our model on, but as in any other area of Machine Learning there is no free lunch: test combinations, investigate which one fits your data and algorithm best and apply it.

Trust Occam’s razor

Machine Learning models can become very complex, with dozens or hundreds of features to compute and to understand. This happens in churn analysis too, especially when computed business metrics come into the recipe. Although for our algorithms this can be beneficial (the richer the information, the better the model) we can fall into the “dimensionality curse”. Having too many features when we can come up with a model good enough (remember, we will never have a perfect model) with fewer features is a waste of time and compute power.

This is especially critical if we think about one of the key questions in our approach to churn: “why are my customers abandoning me?”. If the answer relies on hundreds of factors it will be extremely hard to make it understandable to our colleagues or clients when designing retention campaigns or, even more important sometimes, when we are selling our results!

On top of that, having simple models usually makes them more robust and less prone to overfitting (a.k.a. overspecialized). Data will always surprise us when we put our model in production. Customers will always behave differently and new customers will arrive with new cases. Having simpler models (as long as they perform well enough) will be usually better than overcomplicated models since they usually generalize better with new data (they will perform better against unseen data).

The why of everything together

Having a good churn model is great but not enough. Being able to detect churn and predict which customers are more likely to abandon our business is not useful if we are not able to interpret why, which aspects of their behavior are the most important ones for those customers who churn. This is achieved through model interpretation since the model is a summary of the whole dataset. At the same time, the dataset is (or should be) a summary of our customers’ behavior. Thus, interpreting the model we are interpreting our customers’ activity, but not in a general scope but focusing on their churn-related actions or profile.

Model interpretation is a hot topic nowadays in Machine Learning. It is a fundamental part of ML for many reasons. Some of them are:

  • Allows us to understand better our models beyond metrics
  • It is useful to debug and improve models
  • To detect and avoid bias in our datasets and models
  • To reach prescriptive analytics (as the aforementioned retention campaigns)

Feature importances can be extracted from our models to check which features have contributed the most for the model to decide whether a customer will churn or not. Different algorithms save this information in different ways. For example, the tree-based algorithms save this as metadata depending on how much each feature has helped them in the training phase to learn from the dataset about our customers’ behavior. Other algorithms, like the logistic regression, save these importances in the form of coefficients or weights for each feature. In every algorithm that saves its feature importance information we can extract and plot it.

Other algorithms, like Support Vector Machines or Neural Networks, have been historically considered “black boxes”. That is not true anymore thanks to different techniques like Permutation Feature Importance or the Shapley Additive Explanations. They aim to be model-agnostic since they don’t relay on the inner characteristics of the model to inform about the importances but they challenge the model feeding them with modified data and then analyzing their performance. If the modifications have had a deep effect on the model’s performance they assign a higher score the modified feature. Requiring constant model challenges and interactions with it these techniques can be time-consuming, but allow us to explain virtually any model, including complex ensemble models used frequently in production systems.

References

https://imbalanced-learn.readthedocs.io/en/stable/index.html

https://scikit-learn.org/stable/modules/feature_selection.html

Debugging applications against production data: obfuscation and GDPR

Debugging applications against production data: obfuscation and GDPR

Have you taken a few minutes to think about the way you work with live databases? What about those under development environments? Organizations handle an enormous volume of personal data on their data platforms and the digitized, physical electronic documents they hold. About 90% of the documents that companies store have some kind of personal information.

Are you acting appropriately to protect sensitive information, as required by law? Data obfuscation can help you comply with the GDPR. In this article, we tell you how. (more…)

Not all virtual machines are the same

Not all virtual machines are the same

It is not uncommon to find a wide range of situations among our customers in terms of virtual machine performance with SQL Server. In many cases, we find situations where performance levels are far from ideal but, in general terms, virtual machines themselves are not to blame. What usually happens is that when we move SQL Server to a virtual machine, we become constrained by a maximum or limited amount of resources (CPU/ memory/ IO) that is significantly different to that of the physical machine. (more…)

Network latencies over 1ms. Are they enough for a good SQL server performance?

Network latencies over 1ms. Are they enough for a good SQL server performance?

Gradually, as storage gets faster and local SSD storage becomes more popular, etc. disk access times are significantly decreasing. In these regards, perhaps the best example are the SSDs Optane systems, notable for their much lower read/ write latencies than with traditional SSD’s, in addition to being directly connected through the PCIe bus: (more…)

Latency, the worst enemy for any Hybrid Cloud environment

Latency, the worst enemy for any Hybrid Cloud environment

In the last few years, we are increasingly finding more hybrid environments where some SQL Servers are being migrated to the Cloud. In these cases, other applications, services, ERPs or even SQL Server instances continue to be based OnPremise in the initial data center. This means that in the event of any connections between both environments, these will be restricted by bandwidth and higher latencies, as opposed to other connections that do not go across both environments.
(more…)

Azure Files Premium + SQL Server Failover Cluster instance = simplified OnPremise to Cloud

Azure Files Premium + SQL Server Failover Cluster instance = simplified OnPremise to Cloud

One of the issues that many of our customers face when attempting to migrate OnPremise instances to the Cloud is the lack of a simple “shared storage”. Although there are some alternatives supported by third-party software or SDS solutions that allow us to configure a Failover Cluster instance in Azure, these are highly complex, therefore adding significant further costs to the solution’s TCO.
(more…)

Azure Database integrated authentication with SSIS

Azure Database integrated authentication with SSIS

In many scenarios, we face the need to use integrated authentication in order to gain access to the required data sources to feed our analytical system. In view of Azure’s increasingly widespread use, as is the case with at least part of our infrastructure, some of these sources are hosted in Azure databases. In this case, we will discuss an actual error that we have come across when configuring and using integrated authentication in Azure databases with SSIS.
(more…)

Power BI Bookmarks! What are they for? How can I use them?

Power BI Bookmarks! What are they for? How can I use them?

In this entry, we will show you how to create bookmarks and a few different scenarios where they might be useful. Power BI Bookmarks are basically used to store the status of a specific report page including the filter selection and the visibility of the different objects, allowing the user to return to that same status by simply selecting the saved bookmark.

(more…)