Investigation: Data Science in Insurance

Jan 23, 2016    

At Metis, investigations are a chance for students to present a topic in machine learning or data science more broadly. I looked into Natural Language Processing and offered a glimpse of my experience with data science in insurance.

For my first investigation, I presented some of my work experiences as a data science intern at an insurance company, and projects related to text mining and news analytics. In addition, I introduced to the class the basics of Natural Language Processing and Python’s powerful NLTK (Natural Language Toolkit) package with a live demonstration.

You can find the investigation files in my Github repo. And in this first blog post, I will discuss some of the uses of data science that I’ve seen in Insurance.

Data Science in Insurance

Traditionally, insurance has always been about data and information. An insurer can purchase your risk because they have the data and tools to be able to assess and spread risks and charge an appropriate premium. There’s competition in this industry because there’s information asymmetry - if your company can assess risk better, you have a chance to attract less risky customers with better rates. In fact, there’s an entire established field of actuarial science to apply statistics to insurance and risk management.

But how are insurers using data differently today? Here are some of my observations:

  • Automobile Insurance: Your cars today are already recording a lot of data: your speed, acceleration, collision information. And with cellular connected cars, even your precise location. Now insurance companies like Progressive can tap into all that data with a OBD-II dongle, to track how much and how far you drive. Presumably to lower your premium.

  • Health Insurance: There’s been tremendous growth in Health Informatics and its data, but I am particularly interested in how wearable activity trackers like Fitbits are providing additional data points to health insurers about our daily habits.

    I’ve heard anecdotally of an employer that lowered all employee health premiums, but requiring everyone to wear an activity monitor. If your daily step is under a certain threshold, then your premiums would rise. This resulted in employees shaking their Fitbits while sitting at a desk to get their daily required steps in.

  • Life Insurance: Probably the original commercial applicators of data, they go beyond traditional sources and life/death tables now. New sources of information such as social media, web analytics, and marketing research can all contribute to building a better picture of the insuree.

I used to think that insurance was a slow moving industry in terms of technology. So I was pleasantly surprised to receive a call last spring about an internship opportunity in a fast growing data science department. The company dealt mainly with commercial liability, and here are some of the areas:

  • Flood & Property Insurance: I’ve seen analysis done here that goes beyond FEMA flood maps. Insurers are analyzing geographical surveys of individual properties to see where they lie in relation to the surrounding area. In other words, if your building’s in a “hole”, you may be more at risk of being flooded.

  • Workers Compensation: Fraud detection and claims correction are big here. Now claim reports, doctor notes, and even social media (photos of jetskiing when your arm was reportedly broken) are digitalized and analyzed for discrepancies.

  • Cybersecurity: This is a rather new insurance product, but there are a lot of work done to try to measure the cost of a cyber breach. All this is modeled from data that are made available.

  • Director and Officer Insurance: This pays if a company’s officers are sued, and can be split into 2 categories

    • Security Class Action: When stockholders and investors are unhappy with the officers’ actions and causes a drop in stock prices, they may sue. Predictive analytics can track large drops in stock prices, unusually high volumes of transactions, or short interest, so that an insurer will not be surprised by a claim.
    • Non-SCA Claims: These lawsuits can come from employees for workplace misconduct, customers for injury, regulators for wrongdoings, lenders for unpaid debts, or even competitors for unfair trade practices.

Predictive Analytics

Starting with Non-SCA claims, can we use news data to predict an upcoming claim? Makes sense right? If news about a certain company indicates some shady business, there could be a suit against the officers soon. We can search news about the company for keywords like: regulators, investigation, complaints, lawsuit, fraud, recall, or injury…

Ultimately, if underwriters have a reliable source of news articles and sentiment about a potential client, it gives them a better picture of the client’s risk profile.

This is complicated for small-medium companies since they’re not likely to be in the news much. But for large well-known companies, it’s also tough to find relevant news articles apart from the others!