Week 8 and 9 at Metis Data Science

Mar 15, 2016    

Another 2 weeks has passed swiftly at Metis. We worked on and presented our 4th project around natural laguage processing and unsupervised machine learning, and we learned the basics of distributed computing with Hadoop, MapReduce, Hive, and Spark. Here’s a summary of what I worked on and some of my thoughts:

We have less than 3 weeks left at Metis and I’ll be working hard on my final project, so I won’t be posting as frequently on my blog. However, I do have a lot of cool projects and writeups that I’m eager to post here. So stay tuned.

Project 4 - Unsupervised Machine Learning with Natural Language Processing

Our fourth data science project at Metis involves using text data and natural language processing techniques. Again we’re free to find a topic that we’re interested in. I decided to explore a specific interest of mine: crossword puzzles. As crossword puzzles are quite indicative of current events, pop culture knowledge of its solvers, and change of the English language, I wanted to create a tool to explore how clues have evolved to describe a certain word.

I’ll write up a more detailed post on the project, but here’s a sneak peek at one the visualization. I look at how the word Euro has been used over time:

Did you know that an euro is a large red kangaroo? Well before the mid 90s, that’s how the word Euro is sometimes used. Euro is also consistently used as a prefix to words to form eurodollar or eurobonds. The Euro currency started gaining traction in the 90s, and we see a peak in its usage coinciding with the currency’s official adoption in 1999. It’s also interesting to see how the Euro currency gradually stops being new and becomes a replacement currency, perhaps a nostolgia for the franc or the Mark.

All these clusters and “topics” were done through unsupervised machine learning with my intervention only in edit the wording of the topics. Cool huh?

Final Metis Project

Currently, I plan to extend my crossword puzzle project to my final project and attempt to generate a themed crossword puzzle based on a chosen word or theme. And for my final presentation, I’d like to present a functional full stack web app. In the background, there will be some serious machine learning and algorithimic programming. I’m pretty excited!

Other tidbits:

  • Setting up web app on Heroku
  • Code review with instructors
  • Mock non-technical interviews, recorded, great feedback
  • Resume clinics
  • Pair programming
  • programming concepts
  • I’m learning JQuery, Javascript
  • and some front end: Node.js, Express.js. Looked at Ember.js
  • Advanced 2 challenges in foobar
  • But also reviewing a lot of programming concepts
  • nypd: alex chohlas
  • investigation on Google PageRank