Just wrapped up Week 4 at Metis and that means we’re 1/3 of the way through already. This week’s been very fun and challenging as we’re finally diving into supervised learning. It’s also more relaxing being the first week of our 3rd project. Here’s what we’ve been doing.
- Supervised Machine Learning
- K Nearest Neighbor
- Logistic Regression
- Support Vector Machines (SVM)
- Decision Trees, Ensemble, Bagging, and Bootstrap methods
- SQL and NoSQL
- Setting up an EC2 server on free tier AWS to host a SQL server in the cloud
I am very impressed with how Metis is covering these machine learning algorithms. Not only do the instructors present the basic idea and application, they attempt to explain the math behind them. This is hard to do in a full week in a college or graduate course, but they are doing a pretty awesome job.
I’m still learning about them too. And one of the best way I learn is to try to teach them. So I’ll be writing a few tutorials to try to explain what’s going on. Keep your eyes open for upcoming blogs on linear regression and logistic regression.
Project 3 - Supervised Learning and Classification
We started on our 3rd project this week. This time we’re using supervised machine learning, specifically classification, to explore a dataset of our choice.
A recommended starting point was the excellent UCI Machine Learning Repository for some prepared data set. There are some very interesting classification problems here. I was initially very interested in the Higgs Boson data, but thought that while I might be able to make a good classification model, I lack the domain expertise to interpret my findings well enough in just 3 weeks.
I also wanted to explore New York City’s education using the vast amount of data available on NYC Open Data but before I was able to come up with an interesting and compelling question, I came across a cool and current challenge on Kaggle doing classification to expedite credit card applications for BNP Paribas. So I’ll be working on this for Project 3 and through the duration of the Kaggle competition.
Guest Speaker: David Robinson
David Robinson of Stack Overflow came to speak to us on Thursday about some of his current project and gave some great advice. I blog about his visit here.
His excellent blog on statistics, R and data science is Variance Explained
Other Events This Week:
LinkedIn Workshop: One of our career advisor, Jennifer, led us on a workshop to improve our LinkedIn profile. It’s great to see the hands on emphasis that Metis is placing on having a professional profile that showcase our skills and data science experience. Here’s my LinkedIn profile. Feel free to connect with me.
2 on 1 meeting with Jason and Debbie: I had a chance to sit down with Jason, Metis co-founder, and Debbie, Metis chief data scientist, to talk about my experiences and feedback so far at Metis. I am having a very rewarding and fun time. I expressed that having real projects to work on at Metis was and continues to be a huge draw for me. I also suggested 2 things:
- Linear algebra is quite important to be able to understand how most machine learning algorithms work behind the scenes. I recommended that some linear algebra be included in the pre-work for new students.
- Business application when it comes to data science projects. You hear everyone talking about how cool and sexy data science is. While every company is rushing into this field, at the end of the day this is still a cost center to a business. Whether or not students want to be working at a for-profit organization, I think there needs to be some discussion on how to present and formulate a business case around our projects. I think a great way without adding more lecture material is to bring in a speaker that can speak from a business or product management perspective on a data science team.
Google foobar programming challenge: I had the bandwidth this week to work on the next programming challenge. This one took me the longest to do brainstorm and implement so far. But I was able to complete level 3 this week! I talk about my experiences so far here.
NYC Data Science Study Group Meetup: I went to a Meetup on Monday at Dstillery with Susan Sun presenting on Feature Engineering. Great talk and I met some people from General Assembly’s part time data science programing during the networking time. Lots of fun and I’ll be going to more Meetups.