There’s hope for me. I got a bachelors in economic ages ago and built some crude statistical models with the tools that were available back in the 80s. But I don't think I will be the greatest data scientist in the modern world. I think people who are 20 years younger than me are probably are more adept.
I blogged recently about some of the key components of a data science curriculum. One of those was paradigms and practices. While you don't necessarily need a bachelor degree, you want to know what the heck is going on. What does paradigms and practices mean? What does that entail?
It means first and foremost that you understand the core function of what a data scientist is. You’re not doing business intelligence because business intelligence is more about routine performance tracking. An analyst is looking for the patterns in the data set that are non obvious.
Take your monthly sales and your finances and so forth. Quite often usually the models, the data schema, and the report structures are all crisply laid out and in fact mandated by the business practices, by government regulations and what not. Do you understand that data scientist comes in when it’s not obvious what's going on inside the data, when you need advanced statistical tools to do regression modeling and segmentation and clustering and primary components analysis.
#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index |
|
Read More |
That is when you need all the heavy hitting machineries such as statistical analysis to be able to pull out those patterns, those regularities, to look for cause and effect if what is not obvious. So first and foremost you understand what the role of a data scientist is visa vis-a-vis traditional analytics and BI.
But also it’s important you understand that how a data scientist fits into an overall data analytics program. You’re not the only player in the team. You’re the player who does the exploration of a data set, but you depend on some other critical roles. You depend on the data integration specialist to do the data extraction, ETL, and cleansing and all that to prepare the data ideally for you to build your models from.
You also need to work hand in hand with a subject matter expert. You, yourself, may have PhD, but if you’re going to build a customer experience optimization model, it might be a good idea to hook up with some people who’ve got advanced degrees in behavioral science, for example.
So you have to understand where you’re expertise is and where your specialized expertise needs to interfaced to other people. You need to interface to the people who manage the underlying platform. The data and analytics platform could be a data warehouse like a Netezza, or it could be at a Hadoop platform like IBM Big Insights or what not. So you have got to understand that you’re just one of several players sitting around the table here, but you work together in an operational environment to build statistical and predictive models.
Models are the key aspect of this job. I worked at SPSS many, many years ago before it was even a part of the IBM family, and I know that some of the statisticians there were just working on algorithms day in and day out. So modeling and algorithms are another part of the key curriculum.
Because that core tool set whether it’s SPSS or the fine tools from SAS or tools from the open source community, or there’s lots of others vendors that provide really strong modeling tools out there. The important thing is no matter what you’re using inside of your company to build these models and score the models to validate them against fresh data.
It's important that you understand the core algorithm and approaches. Linear algebra is a basic step. Linear algebra, regressions, data mining, predictive modeling, cluster analysis, association rules, market based analysis and so on are areas you have to know. Some of the disciplines can get pretty arcane for those who don’t do this for a living. Recognize that you have got to master those techniques and the statistical tools to be able to be an effective data scientist in the real world.
Yes, there is a bit of learning curve, and it can be forbidding for some people, especially people who are scared of math, but you know anybody with gray matter in their heads can learn this technology and not just the technology but the underling concepts.