David Robinson Speaks to Metis
David Robinson of Stack Overflow came to speak at Metis on Thursday night. He was real and shared extensively about his experiences. I know most programmers find Stack Overflow to be essential when they hit a problem, so it was cool to hear his insights about the company. Here are some of the stuff he shared.
Favorite data science package? ggplot2 in R Advice for budding data scientists? Make public artifacts.
David did a PhD at Princeton in Computational Biology and was actually recruited by Stack Overflow because he was actively participating and answer question on the site. The question that got Stack Overflow’s attention was a stats question: “what is the intuition behind beta distribution”. David was able to explain it clearly with a baseball example.
He shared some of his work on the jobs and career section on Stack Overflow. By tracking the technologies and programming languages posted by applicants and searched for employers, he was able to generate a graph programming languages and its popularity over time. It’s cool to see the rise and decline of Java, while Python and Javascript seem to be at their highest points.
His blog varianceexplained.org is really awesome. He has clear explanation for statistics, tutorials in R and ggplot2, and just a bunch of fun examples: like analyzing the network in “Love Actually”.
Some other blogs that he recommended:
- FlowingData for their visualizations
- Machine learning subreddit
- Data Is Beautiful subreddit
And some data scientists on twitter:
- Jeff Leek, Roger Peng and the Simply Statistics Blog
- Hadley Wickham, creator of ggplot2 in R
- Wes McKinney, creator of Pandas in Python
- Hilary Parker of Etsy
- And David’s own handle @drob
His advice for steps to take on a data science project?
- Start with a specific question
- Find appropriate data
- Look at a few examples
- Then scale
And here are some of my personal takeaways from this excellent talk:
- Setup profile on Stack Overflow’s career/job site
- Start contributing on Stack Overflow (starting from smaller tags)
- Keep writing this blog
- Fix up my Github and annotate my repeatable code clearly
- Make as much projects and code as I can public