I have heard a lot lately about data scientists. It seems like these guys are almost as big as Justin Bieber right now. I even saw the Harvard Business Review recently called data scientist the sexiest job of the 21st century. Is that cool or what?
When I saw that article, and I had to tweet “that true, if they disqualified actors, singers, dancers and beauty queens from the contest, then yes it’s probably on the top 10 sexiest jobs”. But you know my criterion for sexiest job is that the people have got to look good naked. I doubt that data scientists fill that bill, but you never know. But no we don’t want them tweeting those pictures to us right now.
But let’s get serious for a second. What is a data scientist? I keep getting asked that question for the good reason that I keep a blogging on the topic and talking about it constantly. The whole notion of a data scientist is that it is somebody who is an analytics professional, whose core job it is to build statistical models of complex data sets, large complex data sets, in order to be able to find statistical patterns within that data that are not apparent to the naked eye or may not be apparent to structured reports that let’s say you might pull up in your business intelligence application.
So we can say statistical modeling, fundamentally, it’s what a data scientist does. And they build statistical models for a number of purposes. Applications and businesses have been using them for a long time.
#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index |
|
Read More |
Data mining is another task you associate with data scientists. First and foremost you’ll find statistical dependence for what are often called non obvious patterns in data sets. You are trying to mine the data. It could be customer buying data. You are looking for customer buying patterns going back any number of years. Where you’re trying to look at those patterns across diverse variables, diverse independent variable that you know individually or in combination you know explain why a customer bought a given product on a given day in a given store at a given price and so forth.
So one part is data mining, which is looking for patterns in historical data sets. The forward looking aspect to that is predictive modeling. You look at historical trends based on statistical patterns found in the data. And then you projector or forecast what will or might happened if various variables come to pass in the future. When I say the future, quite often predictive modeling in a business context is what is the customer likely to do in one minute from now if we make them the following offer with the following terms and so forth.
So if you look at data mining and predictive modeling as being core functions of data scientists, you also look at things like natural language processing for content analytics like the social sentiment analysis. That’s another core data scientist function. So really the whole range of advance analytics functions focused on predictive and content analytics -- that all data science.
Modeling and predictive analytics going on sounds like it’s pretty intense. What kind of training is required? Are we looking at something like eight years of college and doctorate degree? You’re more than welcome to get a doctorate degree not only in statistical and mathematical subjects, but just in business areas, whether it’s economics or marketing or psychology or what not, any number of degrees are really, really good background for data science. You don’t necessarily need an advanced degree to do this science. You don’t even necessarily need to finish all four years in college if you have learned at the skills of data science in your schooling or on the job or even taught them to yourself.
|
View a 2-minute demonstration of InetSoft's easy, agile, and robust BI software. |
The important thing is can you do the work. Can you build statistical models? Can you score them against fresh data to look at the fitness of those statistical models against actual data that’s observed in the field. Can you use the tools whether it be SPSS or modeling tools that allow you to build our models or what not. Can you build models? Can you prepare the data? Can you extract them from various sources and combine it and transform it to a form that you can build a model around?
So in other words if you can do the work, and you don't have any schooling, that’s great. But usually in the business world we prefer that you have at least a background and a BA or a BS, and hopefully you’ve got a focused major that makes you a valuable in the business world. Let’s say you’re building a marketing campaign optimization model, it’s often a good idea to either have a degree or understanding of marketing best practices. Then in that case you’re competent to build your statistical model. You’re a subject domain expert. That makes you quite valuable.