Mark Flaherty (MF): And I think some of the analytic software vendors have improved interfaces to facilitate this trend towards less specialized users doing data mining. The number of people who are statistical analysts has not increased in recent years, but the demand to apply predictive analytics to business or government domains has increased dramatically. And as this new group of people has become involved, they have required simpler user interfaces.
They have worked collaboratively, to achieve better outcomes working together. They may have been tapped into for business knowledge, but they didn’t really understand how predictive analytics works or how you get a lot of value out of the data, what the process is. Now that they are more hands-on, and the simpler interfaces are available, things have really changed. I think that that is in a sense, a best practice, if you want to get more transformative results.
#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index |
|
Read More |
Moderator: Let's focus on the other side of best practices, if you will, worst practices, what are some of the biggest mistakes that you have seen people make when either launching a data mining program?
I think one of the big ones that comes up again and again and I really try to advise clients early on when I see this is that they get very excited about the possibilities of the technology. They read case studies. They talk to reference customers. They see the kinds of things that they could really change in their organization. But they don’t pick one or two projects as starting points. They don’t try to first get up that learning curve and bring about a positive conclusion to a project. If they did, then they could use it as kind of a template to improve and spread around the organization to go into new areas.
Instead, they try to boil the ocean with too many projects initially. This spreads their talent too thin, and so they can't get good results on any of the analytic projects. Number two, it makes it extremely hard to project manage and bring any of these projects to a conclusion where the results are actually deployed into day-to-day activity. So it hurts both from the talent perspective and from the management perspective and can lead to a kind of frustration with the technology. So my advice always is if you are thinking about 12 predictive analytics projects that you are going to start, think again.
I sometimes take them outside and I say, look if you got 12 right off the bat, you really have zero, because you are not going to get the kind of results you need. My advice is cut the 12 to 3, get some good results that you can use to build ownership and success within the organization. Then if you want to build and plan for 12, you could probably do it.
Moderator: Yes, that’s a very, very good point. And it’s very similar to lots of the other types of initiatives that we deal with in managing information and managing data. I hear that all the time in things like MDM, but also data warehousing, you really need to start simple. Start with a few targets, for example, a few target systems in the case of something like MDM or a few target dimensions, if you are talking about data mining. Let's dig into the models and best practices for building models, because usually what you are doing is you are essentially mapping these models by looking at key dimensions and you try to see what kind of patterns come out, right?
Flaherty: Yes, I think you do that, and I think we had an earlier discussion that was excellent about text mining, which is an increasing focus in the market One of the things about applying the results of a data mining effort that I always focus on, I always ask a question early on if I can get the right people in the room, okay what system or what process are these results going to be injected into to improve the results of your organization. Then you work backwards from that answer.
You are exactly right. You do tend to be play dimensions off one another, as you build predictive models in various ways. What you will see, of course, is different columns being selected as major inputs to the results and then playing them off one another. It can be tedious process and a labor intensive one. Use visual analysis software to facilitate that process and focus on where those answers go at the end game.