How Much Data Can You Visualize In-Memory

This is the continuation of the transcript of a Webinar hosted by InetSoft on the topic of "How Data Discovery Software Uncovers Stories in the Data." The speaker is Abhishek Gupta, Product Manager at InetSoft.

That concludes the formal part of the webinar. We committed to stay over a little bit to answer questions. So we are going to pick up questions now, and let’s take a look at what we have? And, yeah, feel free to email additional questions to info@inetsoft.com. Here we have got one question about how much data can you visualize in-memory, which comes up every time.

In-memory technology has expanded tremendously. Our compression algorithms are pretty good. We are fitting tens of millions of rows on a normal windows class machine whether it's a client or server, and we’re often pulling in 60, 80, 100 tables from something like an Oracle database on a nightly basis and linking and joining them and doing roll ups and whatever else we have to do.

We are not going to put you of Wal-Mart transactions in-memory. But if you’re doing analysis you probably don’t want to look at that anyway you probably roll up the transaction level to the day level if you’re looking at three years of data. And if you are looking at products, probably you could roll up to the class or the sub class instead of the SKU.

#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index Read More

Ways Of Handling Large Data Sets

Even for the very large data sets when you get down to solving business problems and doing analysis, there are ways of handling large data sets in a way that’s appropriate to solve the problem. And we really know the size you can put in-memory really has been constrained, and these examples I’m running with have a couple of million claims. I am using a laptop and performances are sub second on an interaction which is what we design for. Let’s look another question. What kind of problems are you involved in?

We are working in all kinds of industries both direct and via partners. We tried to share examples in this Webinar from four different industries. The common theme is people who need to drill into explore the data, to answer questions that can be anything from fundraising or sales. For example, a frontline business development manager who is going out on a trip to Los Angles and wants to know who he or she should visit.

You want to take a list of 100,000 and cut it first geographically by Los Angeles. Then maybe filter for if they gave last year but haven’t this year. There are a common set of cuts where you’re doing a list reduction that’s why fairly it’s a common application that spans a lot of industries.

Another one is the more metrics based kinds of problems like with the call center where you have got a lot of calls. You have got a lot of callers. You have got different segments of customers. In that case of calling alumni, there were different segments of customers. In another setting, you’re trying to see who is doing well and who is not. It is not as simple.

I am under on pledges maybe I am under on pledges, but I have got a tougher group to call. Maybe I have been calling during the bad time of day for these customers. My shift should be shifted. So it's those kinds of problems to summarize. Risk reduction where you take a massive amount of data and cut it down to get a list to do action on. It's not just one metric, but you eventually have to get all the way to details to take action. And I think the airline example of visual display is good, where you can’t find the problems by looking at a list. You need to see the patterns, and the patterns are a complex set of metric that with color and shape they jump out, and you say I’ve got a problem in the north east.

Read why choosing InetSoft's cloud-flexible BI provides advantages over other BI options.

How many levels can you observe in a theoretical visualization that you show?

We can take the raw data instead of hierarchies. When you saw the example of the bar chart on giving, I don’t think I still have it open. Where we had the gift brackets like 10,000 to 50,000, now back in the hierarchy where you have gift buckets at one level. You can right click and adjust the gift amounts so the buckets would explode into all detail.

The cross hierarchy selection also works because we are doing that off the detail data in the raw table. We are not pre summarizing it, and we’re not working on the cube. The beauty of the in-memory analysis is you’re working on the raw data. So you have got this incredible ability of select subsets of one part of one hierarchy and mash it up with another.

Some of the charts also you will show hierarchy. A bubble chart can show what they are. A heat map is actually a chart that shows hierarchy. We saw the groups of fund categories, sub groups in a fund family. Then we are coloring individual funds with the selection of fund levels. You could see how the funds fit into the families and the categories, and it's fairly quick visual way to say hey a good part of my international funds is where the problem is.

When Is It Better to Use a Heatmap Than a Bubble Chart?

Choosing between a heatmap and a bubble chart depends on the nature of the data you want to present and the kind of insights you wish to convey. Both visualizations are effective for different scenarios, but they excel in distinct ways based on data structure, dimensionality, and the type of relationships you want to illustrate. Below are the specific conditions and contexts where it is better to use a heatmap over a bubble chart.

1. When Visualizing Data Density and Concentration

Heatmap:

  • Best for: Representing data density, concentration, and patterns within a matrix or grid-like structure.
  • Use Case: Heatmaps are particularly effective when you want to show the concentration of data points across two categorical dimensions (e.g., days of the week vs. hours of the day for website traffic).
  • Example: If you have a table of data where you need to show which regions have the highest and lowest values (such as sales by region and product type), a heatmap allows you to see the density and intensity of values clearly through color gradients.

Bubble Chart:

  • Not ideal for showing density: Although a bubble chart can represent multiple variables, it is not as effective as a heatmap in showing data density or concentration patterns. Bubble charts work better when visualizing three variables: two positional (X and Y axes) and one represented by bubble size (and sometimes color).

2. When Dealing with Two Categorical Variables

Heatmap:

  • Best for: Showing relationships or correlations between two categorical variables.
  • Use Case: Heatmaps are ideal for cross-tabulated data, where each cell intersection represents a unique combination of categories, and the cell's color intensity reflects a measure like frequency or magnitude.
  • Example: In a heatmap comparing customer satisfaction scores (rows) against product categories (columns), you can quickly see which categories have the most positive or negative feedback.

Bubble Chart:

  • Less effective: While bubble charts can display categorical data, they do not convey relationships between two categorical variables as intuitively as heatmaps do. A bubble chart is better suited for showing quantities across categories with the help of bubble size or position.

3. When the Emphasis is on Patterns or Trends Across a Grid

Heatmap:

  • Best for: Identifying patterns or trends across a matrix layout, where data points are related spatially.
  • Use Case: Heatmaps make it easy to spot patterns like seasonality, correlations, or outliers in structured data.
  • Example: In financial data analysis, a heatmap can show the performance of various stocks over time, highlighting periods of gains or losses.

Bubble Chart:

  • Less effective for pattern detection: Bubble charts, due to their scattered nature, make it harder to detect systematic patterns across a grid. They can show clusters but lack the uniform layout that makes pattern identification as straightforward as in heatmaps.

4. When Showing Relationships Between Many Categories

Heatmap:

  • Best for: Representing data with a large number of categories along two axes.
  • Use Case: Heatmaps are effective when you have multiple categories that need to be compared simultaneously, such as product types vs. customer segments or departments vs. KPIs.
  • Example: A heatmap can show employee performance ratings across different departments and job roles, enabling quick identification of high-performing teams or areas needing improvement.

Bubble Chart:

  • Less scalable: Bubble charts become cluttered and difficult to read when there are too many categories or data points. Each bubble represents a unique combination, and too many bubbles overlap, making it hard to discern individual values.

5. When Highlighting Variability or Distribution of Values

Heatmap:

  • Best for: Highlighting variability within a dataset by using color gradients to represent changes in value.
  • Use Case: Useful for showing variability in data such as temperature across different geographic regions or performance scores across different metrics.
  • Example: In epidemiology, a heatmap can represent infection rates across different countries over time, making it easy to see which regions experienced spikes or drops in cases.

Bubble Chart:

  • Less precise for variability: Bubble charts show variability through size and position, but they are less effective for representing subtle differences, especially in large datasets with overlapping bubbles.

6. When the Focus is on Heat Intensity or Magnitude of Values

Heatmap:

  • Best for: Emphasizing the magnitude or intensity of values, where color intensity indicates the level of measurement.
  • Use Case: When you need to convey how strongly a certain measure is present across different conditions.
  • Example: In customer support, a heatmap can show the intensity of support tickets across different product issues and regions, highlighting where the most significant problems occur.

Bubble Chart:

  • Alternative focus: Bubble charts are better for showing magnitude relative to another measure (e.g., revenue vs. cost with bubble size representing profit), but they lack the intuitive representation of magnitude through a single, continuous scale as heatmaps do.

7. When Data is Naturally Presented in a Matrix or Grid Format

Heatmap:

  • Best for: Data that is naturally structured in a matrix format, like correlation matrices, confusion matrices, or pivot tables.
  • Use Case: When your data already fits into a grid structure and the goal is to visualize the strength or weakness of relationships within this grid.
  • Example: In machine learning, a confusion matrix heatmap can show the performance of a classification model, with cells representing true positives, false positives, false negatives, and true negatives.

Bubble Chart:

  • Not ideal for grid data: Bubble charts are less effective for naturally grid-based data, as they lack the structured alignment that heatmaps provide.
Previous: Run Predictive Analytics Easily and Quickly