Jan 242015 Tagged with , , , , , , ,

Improving Traditional Models Of Churn Prediction

Share ThinkCX Post

There is little doubt that customer churn is a significant issue in the telecom industry, particularly in mature markets where product penetration is very high and there is a declining pool of available customers who are new to the technology.

Over the past decade or so, companies experiencing the pain of churn have begun to deploy systems and processes that identify and communicate proactively with at-risk customers. Most of these processes are driven by

sophisticated data mining and analytics, and it must be said that a great deal of progress has been made in customer behaviour modeling and predictive analytics since the early 2000’s. However, plenty of opportunity for improvement still remains as brands begin to explore retention strategies arising from the analysis of unstructured external data.

Generally speaking, most existing data-based retention systems for communications service providers follow a very simple process: they apply sophisticated mathematical models to structured data from their own internal network and customer data systems, and analyze for patterns and attributes which they can correlate with probable customer outcomes. Then they rank or score customers who fall within the churn risk groups they have identified. The final step is to reach out to the customers in those groups and offer some sort of retention incentive (discount, upgrade, personalized communications, etc…) with the goal of keeping customers in the fold.

There is a fairly long list of statistical and mathematical constructs to choose from when deciding which predictive model to deploy. These include:

  • Logistic regression
  • Stochastic gradient boosting
  • Various Bayesian techniques
  • Decision trees
  • Neural net
  • Random forest
  • Discriminant analysis

Each of the models above has its advocates and detractors, and apparent strengths and weaknesses, depending on the types of data available and the assumptions being used to work up the models. A detailed analysis or evaluation of each of the models is far beyond the scope of this article, but suffice it to say that choosing one model over the other probably only offers small incremental performance gains. And although each has its own nuances and subtle differences, all of these techniques together can be characterized as leveraging internally-generated input data, to as correctly as possible, identify churners vs non-churners in the customer dataset.

A more recent and interesting development in the realm of predictive analytics for telecom is the use of models that measure the influence of a subscriber’s social network on their propensity to churn. The premise is fairly simple: when a family member, friend or some other person close to the subscriber cancels a service, then the influence of that churn event will rub off on the subscriber and cause them to be more likely to churn themselves. So, the more advanced predictive analytic systems map out a subscriber’s social networks in an effort to factor the potential influence of friends, family and colleagues into churn prediction models. This strategy is most directly applicable in the case of phone service providers, who have access to historical call data and can easily enough figure out who the 3 to 5 most influential people are in a subscriber’s personal network. If you’d like to bite into a little more detail on how the analysis of social networks or social nodes can help predict customer churn, please have a look at this T-Mobile presentation or this short slide deck from Dataspora.

However, the fact remains that a company’s internal data drives almost all current predictive analytic processes. To us, that indicates that there is a lot of room for improvement, because there exists a wealth of analyzable data that lives externally to the organization…data that is rich enough to stand on its own, or that can be streamed and blended with internally-derived data. The external data we’re referring to is, of course, social media, where subscribers publicly express their individual experiences and intentions in real time on social platforms.

There are some compelling reasons to look at social media as an important source of data for a retention solution:

  • Social posts can reveal actual experiences and intentions on an individual customer-by-customer basis. There’s no need to infer behaviours based on a customer’s inclusion in a segment, cohort or decile. As we all know, customers do not feel and act the same, even when the most sophisticated statistical model puts them in the same category.
  • Receiving potential churn warnings from social posts, in real time, sets the stage perfectly for intelligent, in-the-moment retention follow-up. Solving a customer’s issue when it is in the acute stage will win you the greatest gratitude and loyalty.
  • Potential churners can easily be cross-checked against their historical revenue contributions, enabling far superior retain/do not retain decisions.

Mining social media data for churn signals holds great promise for communications service providers and other service industries that are susceptible to high customer churn, and here at ThinkCX we’re building a business around the detection and delivery of churn signals for companies looking for a customer retention edge. It’s true that not all your customers have social profiles, and not all those that do can be found, and not all of them post frequently enough to provide useful insights. But many do, and the insights they provide are direct, real, individual, and actionable. Social media just might be the best thing to happen to your churn modeling!