People say data science is difficult, which it is, but even harder is explaining it to other people!
Data Science itself is to blame for this, mostly because we don’t have a concrete definition of it either, which has created a few problem. There are companies promoting ‘Data Science’ tools as ways to enable all your analysts to become “Data Scientists”. The job market is full of people who took a course on Python calling themselves “Data Scientists”. And businesses so focused on reporting that they think all Analytics, Data Science included, is just getting data faster and prettier.
But the tools we use are just that, tools. The code we use requires specialized knowledge to apply it effectively. The data pipelines we create are to monitor the success and failure of our models, it’s an added bonus it helps with reporting. To mitigate these challenges we have to come up with clever phrases such as:
- “Buying a hammer does not make you a carpenter”
- “Knowing how to drive does not make you a mechanic”
- “Following the recipe on the back of cake mix does not make you a baker”
- “Owning Quickbooks does not make you an accountant”
- “Wearing a FitBit doesn’t make you a Doctor
Those seem so clever! So why then do we still have trouble explaining the value of data science to an organization? We know how valuable Data Science can be. There are all kinds of examples out there:
- Kroger having 770 Million customers used buying pattern analysis to enhance revuence by over $12 Billion dollars since 2005
- UPS created drive time algos to optimize traffic and idle time patterns, as well as predictive preventative maintenance, saving 39 Million gallons of fuel and avoiding 364 mmillion miles driven
- Dell states organizations using data grow 50% faster http://www.cbinet.com/sites/default/files/files/Hill_Tom_pres.pdf
- Wal-mart improved online conversion rates by 10-15% through NLP: http://searchcio.techtarget.com/opinion/Ten-big-data-case-studies-in-a-nutshell
- Even something as simple as streamiling the planning process for manufacturing resources: http://cdn2.hubspot.net/hub/64283/docs/production-planning-case-study.pdf
None of the examples provided above are groundbreaking or novel, they are just using the same old predictive analytics techniques used for decades, but they applied those techniques on a larger scale and in new and innovative ways.
Which is ultimately what Data Science is, it’s combining all the knowledge we have about organizational data, the needs of the organization, and all the maths and computer science in our heads – and producing something wonderful.
So maybe our metaphors need to go a little deeper, after all, I suspect the ones I listed initially seem clever to a Data Scientist, but not to someone actually trying to understand what a Data Scientist does.
Let’s expand on a few of our previous metaphors.
“Following the recipe on the back of cake mix does not make you a baker”
What is this actually attempting to communicate? Answer, “Looking up something on StackOverflow does not make you a programmer.” It’s probably why I hear Data Scientist using it, or variations of it, so often.
But the context is valid, just because someone knows how to do something someone else did, does not necessarily mean that they can now make something new.
This metaphor is trying to emphasis the learned ability of a Data Scientist to synthesize knowledge, that is, putting two seemingly unrelated concepts together to create something novel. Perhaps a more effective metaphor would be:
“Of course you can make a cake from a box, but would you make your own wedding cake?”
Most people would say no, and that’s because the a Baker knows how to put together ingredients in much larger quantities, and can work with you to find a special combination of flavors that best suit you.
Same as a Data Scientist, most of the stuff we do can be found online with enough patientence, but only those who have spent their academic and professional life studying Computer Science, Statistics, Linear Algebra, Calculus, and Distributed Architecture will know what to use when and how execute the solution at scale.
“Owning Quickbooks does not make you an Accountant”
While our last metaphor addressed concepts, this little jab pokes fun at the previously mentioned products and tools – and how Data Science is the only profession where people think they can buy their way into it. In fact, while the purpose of this article is to make our typically aggressive metaphors for palatable, we could get worse here. For example:
“What software do you use most for your job?”
“Hmm… probably Excel.”
“Okay, I know Excel, does that mean I can do your job? No? Then why do you think everyone who knows Python can do mine!?
(Note, in case it wasn’t obvious, do NOT have that conversation, with anyone)
In seriousness, the Quickbooks not making an accountant comparison is valid, but it simply needs refinement. So when someone says they don’t need data science because of their tools, a more effective conversation would be.
“Well, would you say most people use an accountant to do their taxes?”
“I know I do!”
“Hey me too! Taxes are the worst… And if your accountant is like mine, so you don’t have to deal with understanding all the IRS and State regulations, Deductions, Exemptions, etc. I bet they charge a pretty high hourly rate am I right?”
“You know it…”
“But you know, most Accountants are using Quickbooks or TurboTax why not just buy that software and do it yourself?”
“Well… my situation is complicated. I have adopted children and a charity, there’s complex deductions and I want to ensure my income is well protected. Plus, they have a CPA…”
“So you could do the basics but you wouldn’t feel comfortable with any uncertainty. Data Science is the same way, there are plenty of tools out there to help accelerate basic data science – we use them all the time – but understanding if the tool is right, if the model is right, that is the value of a Data Scientist.”
You can see where I’m going with this. Most Data Scientists have at least a PhD and some other certifications. We also study constantly to keep ahead of emerging technology and trends.
People do not (often) mean to diminish Data Science when they say tools can do the job, they just do not understand the complexity involved.
“Wearing a FitBit doesn’t make you a Doctor”
Our last metaphor is a big one – the distinction between reporting and data science. I.e. Descriptive vs. Predictive and Prescriptive Analytics.
People can do a lot with descriptive analytics, they are useful for checking the performance of business decisions and tracking long term progress via KPIs. But just like a Fitness Tracker, reporting can only tell you where you’ve been. It cannot identify the cause of problem while it’s happening (Diagnostic). It cannot predict a future state (Predictive). It cannot recommend a course of action based on the data (Prescriptive).
Yet again, our original metaphor is great, but not impactful to those outside of data science. We need to explain ourselves a little better. So here is a response to someone asking why they need data science when they just want to see the data.
“When you go to the doctor’s office, do you tell the lady at the desk ‘I need my heart rate, blood pressure, blood glucose, and hepatic function tests’ then just take the results back home to figure out what to do?”
“Of course not, the Doctor has spent their life learning which test you need, how to interpret the results to diagnose a current or potential issue , and how to prescribe medication and lifestyle changes to get your health back on track.”
Data Scientists do exactly the same thing. We create data pipelines and KPI frameworks to measure performance (Descriptive). We evaluate the business and use data to determine potential areas of improvement (Diagnostic). We determine what the future state of KPIs will be given the current data and what changes will have the most impact to improve the KPI (Predictive). Finally, we recommend changes to fix current issues and prevent future ones (Prescriptive).
Extra Bonus: Setting a Baseline for Rates
Comparisons between Data Scientists and other certified professionals goes beyond just tools and concepts. It also includes compensation.
No one thinks twice about paying a Lawyer, CPA, Doctor, or even Private Chef $300+ an hour. Certain types of Data Scientists actually have more education than even a lawyer (A PhD is a much longer process than a JD.) and require more continuing education than a CPA. (Most Senior Data Scientist would laugh at only 120 hours of study a year, they do that in a month.)
Here is a great article breaking down the different types of Data Scientists and their industry rates: https://www.linkedin.com/pulse/how-much-does-data-scientist-make-ill-show-you-vin-vashishta
Keep in mind, when I say “Data Scientist” in this article, I am typically referring to the “Senior Data Scientist” and “Strategic Data Consultant” roles