An Effective Approach to Learn Data Science
I have been working in the analytics industry for some time and keep getting a lot of questions from people I know personally as well as on professional platforms like linkedin, about how they can make a career transition into this amazing field. I have given some advise to people here and there but considering the depth and width of this knowledge intensive field, i think I can do a better job by charting out a written strategy which I can share with other and hence I decided to share my strategy.
Step 1: Learn the Fundamentals of Data Science
Understand the basics of data science first. It is very important to get your fundamentals strong because as I mentioned earlier, this is a very knowledge intensive field. And what I have seen in most of the courses available online is that they directly start covering the model building aspect. This is where it is very crucial to select a right course to start your data science journey. And for this I would strongly recommend the “Getting Started with Data Science” course available at openSAP. The reason I recommend this course is because it uses the CRISP-DM methodology to teach data science. CRISP-DM stands for Cross-industry standard process for data mining and is one of the best frameworks to approach data science projects. There are 6 phases in CRISP-DM: Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation and Deployment. Most of the data science courses available online start teaching from the 4th step which is modelling. However it is important to understand that the first 2 steps are the foundations of data science and with an incorrect business and data understanding, your data science project is bound to fail. Hence I would again suggest strongly to go through this course as it will strengthen your foundation.
Step 2: Practice, Practice, Practice
Once you have a clear understanding of the concepts and various data science terminologies and the CRISP-DM process, now its time to build some models hands on and the success recipe here is “Practice, Practice and Practice”. The more you will work on different kind of data science problems, the better will be your understanding and your horizon will expand day by day. Practice various kind of problems like cleaning a messy data set, learning to do exploratory data analysis, building machine learning models with techniques like classification, regression, clustering, computer vision etc. Below I am mentioning sources to access these problems:
- Data Cleaning Challenge on Kaggle: Here you won’t learn about any modelling technique but you will learn to prepare your data so that you can run the machine learning algorithms over it. And trust me this is a very important phase of model building pipeline.
- Exploratory Data Analysis: This is the stage where you learn about your data. Here you will develop the ability to form relevant hypotheses and investigate them. You can access a lot of data sets on communities like Kaggle or IBM Analytics community and find a lot of open data sets over there and start practicing
- The 3rd stage would be to start working on data science projects: Kaggle should be the go to choice here. You can practice problems like Titanic Machine Learning, Loan Default Prediction etc. And don’t worry about not having enough experience to solve these big problems. One of the best thing about data science is that there is a very large and helpful community of practitioners and they keep working on and sharing solutions for multiple techniques on a daily basis. For example you can easily find solutions for Titanic and loan prediction problems with articles like Titanic – A Machine Learning Case Study and Loan Prediction Problem – Beating the Benchmark. So i would recommend to get as much hands on practice as you can as the more you deep dive the more interest you will develop and the more you will learn.
Step 3: Network with Data Science Practitioners
This is my biggest learning during my 5 years experience in this industry. Since data science is a very knowledge intensive domain and almost every month there are some new techniques which are introduced, it is impossible to learn everything on your own. But when you interact with other practitioners, join various online communities and meet ups then it becomes fairly easy to stay updated about the new developments.
I hope I was able to chart out a clear learning path. So begin your data science journey today with the amazing “Getting Started with Data Science” course available at openSAP. All the best!