How to get into Machine Learning, my networking session at SAPTechEd Barcelona
At SAPTechEd Barcelona 2016 I was fortunate enough to be a community speaker. I did a lecture on machine learning, something I’ve shared here before.
Meanwhile I’ve published the slides on Slideshare, in case you’re interested.
I also did a follow up networking session that focused more on how to get into machine learning yourself. Because the slides of this short session were rather boring (being just collections of links to books, MOOCs etc) I thought it would be a better idea to write another blog post here to cover the details.
So how to get into machine learning? Of course, there’re lots of options, from just starting out dabbling with code libraries, to enrolling in a PhD program. I’d not really recommend either approach though.
In my opinion it’s important, especially with machine learning, to understand the basics before you really dive in. I know that many developers like to get to know a topic by just starting to write software and by learning from there. The ‘problem’ with machine learning is that the code is often fairly easy to write. Technically it’s not really a challenge, since there are numerous libraries providing easy access to different algorithms. However, using those libraries will teach you very little about machine learning itself. You will essentially be using the library as a black box. So, even if you understand what comes out of an algorithm, or an evaluation metric, you will still not have learned much about machine learning.
At the other extreme end you could start with some education program, and first learn a lot of mathematics (prerequisite for the more theoretical part of machine learning). Which you could then follow up by a deep dive into the different algorithms and their variations, etc. This could easily take you two, maybe three years. And at the end you would hardly have learned how to do machine learning in practice.
So, what is the middle road here? My suggestion is to read some introductory materials first, to give you an appreciation for what machine learning is. You’ll get familiar with some central topics that will always come back, no matter what algorithms you use, or what kind of problems you want to solve. My primary recommendation for this phase: the Data Science for Business book.
Then, you might to want to do a Massive Open Online Course (aka MOOC) on Machine Learning, and there are many nowadays. You could start with the very popular Coursera Machine Learning MOOC by Andrew Ng. Another MOOC that contains more mathematics but is also more in depth is the Stanford one on Statistical Learning, by Trevor Hastie and Rob Tibshirani.
And there’re lots of others. One site that I’ve found useful to get a feeling for what works and what doesn’t is Bill Kymler’s blog. He has been making the transition to data scientist over the past year, and has rated everything he’s done. Certainly worth a read!
Once you have gained an understanding of what machine learning is, how it works, and what kind of problems it solves, it’s important to get your hands dirty.
Here I don’t yet have any recommendations as it’s exactly where I am at the moment ;-).
What I have done in the past, in order to get at least some hands on experience, are more MOOCs, and especially a couple of them that went into Machine Learning & Big Data. Using Databricks community Edition (it’s free!) you get to work on Spark and do some machine learning at the same time. Interesting MOOCs are:
Big Data Analysis with Apache Spark (edX, Berkeley CS110x)
Distributed Machine Learning with Apache Spark (edX, Berkeley CS120x)
I also recommend Jason Brownlee’s blog, with lots of practical advice. I’ve bought a few of his very hands-on oriented books, and those are going to be my next steps into machine learning. But no official recommendations yet…
To finish off, here are some more links that I’ve found interesting.
Neural Networks and Deep Learning: an online book written by Michael Nielsen, who clearly has the gift of knowing how to explain things very clearly. Recommended, even if I’ve not yet finished reading.
Although there are tons of other resources, these should be enough to keep you busy for the next couple of years ;-).
Happy Machine Learning!