For our meeting on the 28th of August, one of our founding members, Varun Saravanan, shared his Kaggle experience with us. I'll summarize his presentation here.
What is Kaggle?
Kaggle is a website that hosts data science competition hosting. There are three major categories of competitions that they host:
- Learning
- Research
- Industry funded (Featured)
Kaggle does not teach total beginners how to program, but if you already have some idea of how to write code, then you can teach yourself about machine learning algorithms by writing code on Kaggle.
My Experience on Kaggle
Varun worked with the following algorithms and datasets:
- Principal Components Analysis (a dimensionality reduction technique) and Random Forests on the MNIST dataset: https://www.kaggle.com/zbpvarun/pca-random-forests
- XGBoost + LightGBM on Zillow housing prices
- Neural Networks on Cdiscount Image classification.
- Humpback Whale Image Identification using bounding boxes and neural networks.
Advantages of Kaggle
Kaggle provides immense computational power through their servers. By using their site, you get access to state of the art machine learning and data analysis packages, pre-installed without compatibility issues. The company also creates incentives for collaboration, and has fostered an open and welcoming community that encourages helping each other to produce better algorithms and results. In addition, Kaggle is becoming a repository of vast, clean, and easy to work with datasets.
Disadvantages of Kaggle
Again: Kaggle's immense computational power and seamless preinstalled packages. If you haven't dealt with installation issues and limitations on your computational power, then there are limits on your real world knowledge. Kaggle also overemphasizes the machine learning part of data science, which is a minority part of the job. Lastly, as welcoming as the community is, competition culture has resulted in a lot of crowding.
Other views on Kaggle
Hearing from Varun was very helpful for other group members. It's also interesting to hear other people's points of view. So, last but not least, I'll put links to some articles I shared in our Slack team that I thought provided different perspectives:
How Kaggle is changing the way companies figure out solutions to their data analysis problems: https://www.theatlantic.com/technology/archive/2013/04/how-kaggle-is-changing-how-we-work/274908/
Recent piece from Kaggle Grandmaster Martin Hanze: http://blog.kaggle.com/2018/06/19/tales-from-my-first-year-inside-the-head-of-a-recent-kaggle-addict/
A different take on Kaggle from Julia Evans, who is super awesome: https://jvns.ca/blog/2014/06/19/machine-learning-isnt-kaggle-competitions/ (Check out her great zines too like How to be a Wizard: https://jvns.ca/wizard-zine.pdf).