Thank you all for joining us today. I'm Abhishek Gupta, I am the Chief Data Scientist at InetSoft. We have four major points that we want to discuss today. The first one being the importance of data science and data scientists and bringing machine learning into organizations. The second one being we've all heard of the V's of big data, and we know that one is velocity, and we know that there's a lot of streaming data out there now.
I feel like that's going to be a big part of organizational strategies moving forward. Point three, how an organization can keep creativity with machine learning. We have all of these different tools to choose from today, all of this different data, but we deal with regulation. We deal with documentation. We deal with productionizaling machine learning code. How do we keep infusing creativity into the machine learning workflow within an organization.
Then, we've also heard a lot about the citizen data scientist recently, and just in general more and more people in organizations wanting to get involve with analytics and machine learning. So that's point four.
Okay, so we're going to start our discussion here, is any of this really new? Is machine learning new? Is data science new? To me this is resounding no. In fact, machine learning has been studied at least since the 1950s, maybe before.
#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index |
|
Read More |
Data science you could say goes back to John Tukey's 1962 Future of Data Analysis Paper. There's a great recent paper by Donoho out of Stanford that talks about 50 years of history of data science, and I urge you to read that. We're seeing machine learning in organizations now. This isn't coming out of the blue. This has a long history, and so we wanted to spend a little bit of time here.
One good thing to do at first is, of course, to define machine learning, and that's really tricky. I think for better or for worse in a certain sense, machine learning has taken on sort of a pop culture, meaning it's just the rebranding of analytics or data mining. Then there is this other academic definition because machine learning has been studied so long within computer science departments at universities.
We are going to have to straddle that definition today because at SAS and in other places we sort of use machine learning in both of these ways sort of as rebranding of analytics, but to me a true branch of computer science also.
To define machine learning I'm going to contrast it with statistics, and I'm not saying that machine learning is better than statistics. I'm just saying that that machine learning is different than statistics. I think this is one of the easiest ways to define it.
Machine learning techniques tend to make less assumptions about data. We typically look for, in statistics, for normality of the data or the data to obey certain distributions. With machine learning we can often relax those expectations on the data which is really nice. Machine learning methods also tend to sacrifice interpretability to promote greater accuracy.
|
View a 2-minute demonstration of InetSoft's easy, agile, and robust BI software. |
Most statistical methods are design to be highly interpretable and parsimonious, whereas machine learning methods are designed to squeeze the most possible signals out of data. I like to put something in the definition of machine learning about systems that are making decisions automatically. Automation is a big part of machine learning. I would also say that is inherent. It is the loop that the algorithms can learn without manual intervention. We'll talk a little bit more about that when we get into more automation topics later on. I think we had a similar Tower of Babel 10 to 15 years ago when data mining was a hyped topic.
I think machine learning is experiencing the same kind of a hype, and we need to be cognizant of what people are talking about when they're talking about different types of machine learning. Is it a specific algorithm set? Is it creating that feedback loop? Is it allowing algorithms to act independently of human interaction? I think all of these can be different definitions for machine learning. I think we have something to say in most of those definitions and we'll try - as we can during the WebEx today.
Okay, so we want to use this definition to lead into our discussion about machine learning.