Week 6 and 7 at Metis Data Science
Wow we’re more than 50% done at Metis. Time does seem to go by very fast. We’ve done some pretty cool stuff lately, so if you’re intersted, here’s what we worked on.
Project 3 Presentation
It’s the 3rd week of our 3rd project, and on Friday we persented our results. I started competing in a Kaggle competiton and got decent results so far. I’ll be posting a blog writeup of my progress here soon.
We have the pleasure of hosting Elise Runde Voss & Dan Elbaz from UpScored on Tuesday. UpScored is a career discovery platform that uses analytics to recommend candidates with the perfect job for their skill level. It’s cool to hear about how they’re using data science to take some headache out of job search and hiring.
On Wendesday, Siddharth Motwani from Priceline came to speak about Priceline’s work on improving their travel pricing. I think we were expecting a data scientist, and Sidd was a last minute change, but he spoke enthusiastically about his role and Priceline’s software by conduting the session more like a focus group.
Week 7 - Unsupervised Learning
In week 7, we started our next project with focus on an unsupervised learning clustering around text and NLP. This area is always interesting to me, so I’m pretty excited.
Flask, MongoDB, Twitter API
A quick intro to Flask as a simple Python web framework to connect to our Python code. And a guided tutorial to using the Twitter API to collect tweets.
So we started unsupervised learning with intro to Natural Language Processing and K-means. Then on Friday we discussed Dimensionality Reduction with PCA.
I’ve decided that SVD and CUR-decomposition is currently my favorite algorithm in machine learning, as they’re not complicated and quite elegant. I’m a little sad that we didn’t go in depth, but I’ll just write a blog post about their beauty. Soon.
Noteworthy Packages for NLP
- SpaCy for some great text processing and NLP functions. Much faster than NLTK
- GenSim for some advanced NLP like topic modeling and Word2Vec
- And of course the tried and true NLTK. Lots of functionality here
Speakers and Meetups
I have several blog posts about great speakers and meetups I went to this week. Check them out:
- Google NYC Tech Talk: Sidewalk Labs
- Gilad Barash, data scientist from Dstillery
- James Faghmous, Founder/CTO of the Arnhold Institute for Global Health at Mount Sinai
- Dr. Kirk Borne at NYC Open Data Meetup
Project 4 - NLP + Unsupervised Machine Learning + NoSQL
I’m a big crossword puzzle solver, so for this project, I plan to use data science to gather some insight into crossword puzzles. Keep looking here for an update.-->