Week 6 and 7 at Metis Data Science

Feb 27, 2016    

Wow we’re more than 50% done at Metis. Time does seem to go by very fast. We’ve done some pretty cool stuff lately, so if you’re intersted, here’s what we worked on.

Week 6 - Javascript & D3

Week 6 is a little easier than learning supervised learning in 1 week. We started looking at Javascript and D3.js. Boy is Javascript weird compared to Python. And D3 is incredibly powerful, but we have to think about constructing our graphs piece by piece by defining every element. I’ll have to take some time to learn both of them from scratch.

Project 3 Presentation

It’s the 3rd week of our 3rd project, and on Friday we persented our results. I started competing in a Kaggle competiton and got decent results so far. I’ll be posting a blog writeup of my progress here soon.

Guest Speaker:

We have the pleasure of hosting Elise Runde Voss & Dan Elbaz from UpScored on Tuesday. UpScored is a career discovery platform that uses analytics to recommend candidates with the perfect job for their skill level. It’s cool to hear about how they’re using data science to take some headache out of job search and hiring.

On Wendesday, Siddharth Motwani from Priceline came to speak about Priceline’s work on improving their travel pricing. I think we were expecting a data scientist, and Sidd was a last minute change, but he spoke enthusiastically about his role and Priceline’s software by conduting the session more like a focus group.

Week 7 - Unsupervised Learning

In week 7, we started our next project with focus on an unsupervised learning clustering around text and NLP. This area is always interesting to me, so I’m pretty excited.

Flask, MongoDB, Twitter API

A quick intro to Flask as a simple Python web framework to connect to our Python code. And a guided tutorial to using the Twitter API to collect tweets.

I have experience with MongoDB, but it’s great to get help on setting it up on AWS. So I’ll have to focus on learning Flask and JavaScript for a web app.

Unsupervised Learning

So we started unsupervised learning with intro to Natural Language Processing and K-means. Then on Friday we discussed Dimensionality Reduction with PCA.

I’ve decided that SVD and CUR-decomposition is currently my favorite algorithm in machine learning, as they’re not complicated and quite elegant. I’m a little sad that we didn’t go in depth, but I’ll just write a blog post about their beauty. Soon.

Noteworthy Packages for NLP

  • SpaCy for some great text processing and NLP functions. Much faster than NLTK
  • GenSim for some advanced NLP like topic modeling and Word2Vec
  • And of course the tried and true NLTK. Lots of functionality here

Speakers and Meetups

I have several blog posts about great speakers and meetups I went to this week. Check them out:

Project 4 - NLP + Unsupervised Machine Learning + NoSQL

I’m a big crossword puzzle solver, so for this project, I plan to use data science to gather some insight into crossword puzzles. Keep looking here for an update.