The Land of Oz Ozzie Liu


The sum of the square roots of any two
sides of an isosceles triangle is equal to
the square root of the remaining side.  Oh
joy, rapture!  I've got a brain!
    -SCARECROW from "The Wizard of Oz"

I’ve worked on some pretty cool data science projects and I want to show them off here, but be patient as I organize and document them here.

What can Crossword Puzzles Tell us About American Culture and Language?

(Project 4 at Metis Data Science)

As an avid crossword puzzle solver, I see that they are actually very indicative of the current culture and language. So with this project, I wanted to create a tool and tell the story of how our focus have changed.

Using Natural Language Processing and Unsupervised Machine Learning, I analyze 40 years of New York Times crossword puzzle clues and answers for trends by clustering and topics with topic modeling algorithms.

Then I created a minimal but functional web-app with d3 visualization to demo the interesting results.

Kaggle BNP Paribas Cardif Classification Challenge

(Project 3 at Metis Data Science)

I particiapted in a Kaggle challenge sponsored by BNP Paribas Cardif to develop a classification algorithm to expedite the personal payment insurance claim process. Given a large dataset with anonymous features, I performed a myraid of operations to arrive at the best prediction, including:

  • exploratory data analyses
  • feature engineering to understand most the anonymous features
  • feature selection and scaling
  • dimensionality reduction
  • model evaluation

And ended up with an ensemble model of random forest and gradient boost.

“The Roger-Ebertron”: Predicting Movie Ratings with Regression

(Project 2 at Metis Data Science)

A machine learning project predicting how the great movie critic Roger Ebert might continue to rate movies today by benchmarking his reviews with other critics and users with linear regression.

  • Skills: Web-scraping, exploratory data analysis, linear regression, cross-validation
  • Tools: Python, Jupyter Notebook, BeautifulSoup, Pandas, Numpy, Statsmodels, Scikit-learn, and Seaborn
  • Blog post
  • Code on Github

MTA Turnstile Analysis for WomenTechWomenYes

(Project 1 at Metis Data Science)

Analyze and presentation of 3 months of MTA turnstile time-series data to help Women Tech Women Yes optimize placement of street teams to spread awareness of women in technology and increase donors at annual summer gala.

News Analytics for Insurance Underwriters

Details coming soon

Season Ticket Holder Churn Analysis

Details coming soon