Frequently Asked Questions
Sometime when I get asked a question I send an extremely well thought out response, which may or may not be appreciated by that person, but I feel like there might be people who would – maybe…
Regardless of my delusions of grandeur, I do want to start documenting these responses both to feed my massive ego and because I am hopelessly lazy. Since the one I get asked the most is “How do I get started in Data Science?” we might as well start there. So here’s my Fall 2017 go to answer on how to get started quickly in Data Science.
No matter what you want to do in Data Science, if you’ve never actually done it professionally, you’ll need a few things.
- A linkedin page
- Your own website and associated email
- A github account
- A willingness to spend time adding content to all the above
The cycle is simple. Make a basic website, think of some specific problems related to your industry of choice, write some articles about how to solve problems in said industry, include some ever-so-slightly modified code examples which you host on github, and put links to all that stuff on your linkedin page and resume.
In fact, right now just calling yourself a “Data Scientist” on linkedin and having your own email address is usually enough to get an interview. And if you’re looking to be a communication focused Data Scientists (i.e. you use something like KNIME to do all the analysis and leave data cleaning to engineers) you can probably stop right here.
Don’t get me wrong, that last paragraph wasn’t intended to be me crapping on business centric data scientists. If they are able to identify a problem in their industry and use public available resources to make even a passingly believable solution, and write an article about it, that person is probably head-and-shoulders above a lot of the folks I’ve seen calling themselves “Data Scientists.”
But we don’t want to stop there, even the folks working on the business side are going to get bored quickly, so where do we go from here?
Entry Level: Maths and Code
Yes the basics of data science: understanding statistics and then getting code to do the statistics for you.
What will you need to know to be a “real” entry level data science engineer?
- Fundamentals of Linear Algebra
- Core statistical concepts such as distribution and regression
- Basics of at least one more matrix operation focused programming language (R or Octave) and more “traditional” programming language for application development and automated data cleaning (probably Python or C#)
- Fundamentals of Machine Learning
That sounds like a lot, and it is. There’s a reason we can’t find a lot of qualified resources after all. I mean, what other profession writes articles like this? We are so desperate for talent that Data Science gives away all our knowledge so that maybe we can hire more people!
But it’s also not that much. The above steps will take most people about a year to complete part time at around 8-12 hours a week. About the same amount of time most people spend watching football on sunday or going to they gym – and who needs that stuff?
Since some of you have different levels of experience, here’s my current recommendations on how to obtain all the knowledge you need:
- Linear Algebra
- EdX gives a very digestible applied look into Linear algebra, this is the best for those not looking to become a statistics focused data scientist: https://www.edx.org/course/linear-algebra-foundations-frontiers-utaustinx-ut-5-05x-0#!
- There’s also the more academic option in Dr. Strangs MIT course. Use this if you want to keep learning more math and potentially publish research yourself: https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/index.htm
- Core Statistics
- Coursera Statistics track with R: https://www.coursera.org/specializations/statistics?siteID=SAyYsTvLiGQ-Wtvxdqskxhvrtdf7n7S9DA&utm_content=10&utm_medium=partners&utm_source=linkshare&utm_campaign=SAyYsTvLiGQ
- If you know very little statistics please take these courses. I cannot imagine how much easier my life would be if I were introduced to Bayesian statistics first instead of learning them later, plus you learn R
- You probably will want to take this course as well since the frequentist folks will use this language: https://www.udacity.com/course/intro-to-inferential-statistics–ud201
- Easy, Khan Academy: https://www.khanacademy.org/computing/computer-programming
- Machine Learning
- Go through all of Andrew Ng’s MOOCs on cousera
- Start with Machine Learning
- Make sure you do the optional exercises, they are extremely hard but completing them will basically give you everything you need to know to be a Data Scientist
Again: this should take about one year at a reasonable pace.
The Hunt Begins
While you are taking these courses, you can engage in certain activities to make your job search easier:
- Update your Linkedin with all the courses you’ve take
- Get a website, link to it on linkedin
- Change your email domain to that website
- While you are doing the courses above, take any really interesting concept and write an article about it, put that on your website
- Share the articles on linkedin
- Once you find something you really like, make a project about it on github
- Use R or Python instead of matlab
- Consider refactoring the code you used above
- You may even do something from a Kaggle.com competition, it doesn’t matter
- Make sure you create markdown pages to show off your stuff, e.g.: https://github.com/thedanindanger/xbrl_automation_R/blob/master/XBRL_functions_overview.md
- Call those ‘projects’ on linkedin
- Optional: Use your website and linkedin as a resume to do some independent consulting on upwork.com part time. Charge a low rate initially. Think of it like an internship.
- Call yourself a Data Science consultant or something as a second job on linkedin if you are allowed to by your current job
- By this time at least a hundred recruiters will have reached out to you if they haven’t already – congrats you’re a data scientist!
The whole system is just making experience so you can get a job quickly. You’ll figure out what kind of career you want from there, but you just need to get started first.
Personally, my next step after doing the above would be to go through this book: https://www.amazon.com/Statistical-Rethinking-Bayesian-Examples-Chapman/dp/1482253445 but that’s also because it’s my favorite book on stats.
You may also want to get good at PaaS architecture like AWS and Azure since that’s where we’re doing the bulk of our development now.. But that’s another article!