Mark Flaherty: With the tools of today, it’s not so hard to simply go to that trial and error process when mixing and matching data sets in Big Data. It works the best in business scenarios such as business analytics.
The best way to get started with these kinds of tools is by using the different fields that you have available and starting to plot them. This allows you see groupings in order to make clusters or outlier.
The clusters start an iterative process of defining relationships between these different data sets. Then you can really drill into what you see visually.
Eric Kavanagh: Yeah. I am guessing the lot of it is just manual iteration, meaning it’s probably difficult even today. Although, Jim was talking about templates some of the vendors are rolling out.
I am guessing it’s pretty difficult for a tool to simply dynamically ascertain which kind of visualization tool to use – bar charts or scatter plots, etc. There are a whole of variety of different tools that you can use.
Even though we are getting to a point where some of the tools can figure out you’re try to show this with a bar chart, for example, instead of a pie chart; is that happening yet?
#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index |
|
Read More |
Mark Flaherty: At a basic level, yes, although it will never really catch up with the way people think. For example, trend data and time series data will almost always be displayed as line charts. And for multi-dimensional analysis, bubble charts are almost always the best method. Beyond that, however, human thought needs to be applied.
Eric Kavanagh: That’s good, so we won’t lose our jobs anytime soon.
Jim Ericson: Less and less likely, I am thinking.
Eric Kavanagh: Yeah that’s it. What are the some of the biggest mistakes you’ve seen people make with data visualization? Such as drawing false conclusions or focusing on false positives? What are some of the errors to avoid?
Mark Flaherty: Definitely that correlation is not causation. Just because we see a visual trend of all the dots going in one line doesn’t mean that whatever we applied them to is actually driving the Y axis.
So, you do have to understand the logic and see if it’s really possible. You need to do things using rational analysis where you bring in multiple variables and actually test. You know -- did they really drive that Y axis statistical?
|
“Flexible product with great training and support. The product has been very useful for quickly creating dashboards and data views. Support and training has always been available to us and quick to respond.
- George R, Information Technology Specialist at Sonepar USA
|
Eric Kavanagh: Yeah, we have to watch obviously if the data is not clean or there is some problem with the data. You’re going to get some wacky visualization so you always have to apply the reality check, right?
Mark Flaherty: Correct.
Eric Kavanagh: And then, I am sure also look at the raw data to see some patterns. Then, the next step is probably, “Okay, let me get underneath this and take a look at the data sets.” That’s when you can figure out some field was askew or you know something else happened along the way to render nonsense.
I was working just yesterday and something happened with the formula because the little graph couldn’t rash dries itself. It just kept dancing around on the screen. I was like, “Mm-hmm, I think there is a problem with data quality in there somewhere.”
So you do have to watch out for just basic mistakes and basic errors to make sure that what you’re seeing is really what you’re getting, right?
Mark Flaherty: Yes, certainly for any new data set that you bring in. You are always going to go through that first step of trying to make sure it’s clean.
Where the software works best is in helping you profile a data you can see. Just take some data that is a part of the points -- then there are some that are negative when the rest are positive and that probably shows you that there is some data input error. It’s a common problem.