Foresight into whether or not insurance customers are likely to renew their existing policy, upgrade to a better policy, or churn altogether gives the insurance company leverage to make business decisions that will help it increase its ROI. It has also been shown that a particular group of customers will respond positively to intervention, such as a friendly phone call, from a sales agent. This same intervention, however, can also deter other customers from renewing their policy. Therefore, it remains pertinent to better understand the characteristics that differ from the customers who prefer the sales intervention to those who find it annoying.
Classification models are a category of predictive algorithms that can be used here to classify insurance customers into four groups: a) those likely to renew without intervention, b) those likely to renew only with intervention, c) those likely to churn only if there is intervention but would renew otherwise, and d) those likely to churn regardless of intervention. Evaluation of the model would lead us to try to minimize false positives in group c and minimize false negatives in group b. The distribution, however, is unknown, but we can view historical records of the insurance agency to see which proportion of customers is larger: those who churn, or those who renew. Assuming that historical trends will persist where more customers renew their policies, group c is projected to be larger than group b. Therefore, it is more damaging overall to have a high false positive rate.
Many different classification algorithms exist for solving this problem. The final churn prediction system is an ensemble of these methods. Ensembling many algorithms increases accuracy, thus reducing the likelihood of incorrectly classifying a customer from group c into group a or b and causing him to churn.
Preprocessing and cleaning of the data was done with OpenRefine, a standalone application that works like a database. KNIME is an open source data analytics, reporting and integration platform written in Java. KNIME integrates machine learning from WEKA and statistical packages in R, which were all used to build an ensemble model for classification.
Tableau Rapid Fire BI, a dashboard enabling rapid visualizations of data from disparate sources, was used to understand the relation between process time at different stages of the insurance cycle from renewal to claim. Rapid Fire BI also categorized the data by the various products offered by the insurance company and how satisfied the average customer was per product. Possible reasons for customer dissatisfaction were also explored. A separate analysis categorized branches by region and evaluated performance and various levels within the region. Lastly, Tableau was also used to assess claims and analyze ssttlement times varying across branches and/or regions.
Data cleansing, involving the handling of missing values, imputation and dimensionality reduction was performed using R. R was also used to classify insurance customers by satisfaction and identify the top reasons for churn.