Big Data Becomes Fast and Approachable

This is the continuation of the transcript of a Webinar hosted by InetSoft on the topic of "10 Biggest Big Data Trends."

So what are the top 10 trends we expect to see this year around Big Data? That's what we will get into now. So, the first trend that we will go through is, Big Data becomes fast and approachable. Options expand to speed up Hadoop, and I'll hand things over to Larry to first give us some commentary on this one.

Larry Chiang: Thanks Abhishek, Hi everyone, so this is essentially a two part trend. The first, it is all about why Big Data has become fast and approachable right, and it all has to do with, how users have been perceiving Big Data. To give you some context, when Hadoop was invented by the folks at Yahoo the original goal was to index the internet.

It was designed to be used as a batch processing engine, but you know as it evolved, people realized the power of scaled out storage, the power of processing and clustered workload management, and they forget, you know, it did make a great data analysis platform. So they started adding components that allow data to be stored and queried using common SQL tools in response to a lot of that demand.

view demo icon
View a 2-minute demonstration of InetSoft's easy, agile, and robust BI software.

Expanded Data Source Connectivity

Here at InetSoft we've continued to expand our connectivity with all of our SQL drivers, but the early SQL drivers were still based on those that produce an algorithm and, which you know, are never great for real-time analytics queries. So SQL Hadoop happened which was still not fast enough which led to continued innovations and optimizations to the execution engines in a Hive on Tez and HIve on Spark.

These have significantly improved speed and as all of this continues to take place, we are seeing that users are increasingly demanding that Hadoop be used by workloads. This is something we hear from our customers. We see this being reflected in a lot of the market trends and research studies.

One that I particularly like is a BI, Hadoop Maturity Survey done by our partners and friends at AtScale, and it's all driven by this movement to expand or expose Hadoop to the business users and not just data scientist. Abhishek or Holly, any comments on that?

Abhishek: Yeah, certainly buddy, I mean, I think from my perspective there's a bit of a data physics I guess you could say, at play here, but to your point around the origins of Hadoop, speed of analytical query was not what the platform was designed for, and the improvements have been made through things like Impala and Spark SQL and Hive on Tez.

And as you mentioned a number of other things have been pretty incredible and it has continued to go really quickly, but it's also been an interesting case were we've seen things that are less typical in InetSoft deployments like OLAP and pre-aggregations and things like that, becoming really very popular in the Hadoop space and that's where the technology we see our customers adopting to essentially index or aggregate the query set to deliver that level of performance. So they are sort of taking advantage of what Hadoop was best designed to do, originally designed to do and then layering on some other capabilities that really make it work well for interactive real or near real time query responses.

view gallery
View live interactive examples in InetSoft's dashboard and visualization gallery.

How Does InetSoft's Data Grid Cache Technology Work?

InetSoft's Data Grid Cache technology is designed to optimize the performance and scalability of its BI and data mashup tools by improving the efficiency of data storage, retrieval, and processing. This technology leverages distributed caching techniques to handle large volumes of data, reduce latency, and enhance the responsiveness of BI applications. Here's a detailed explanation of how InetSoft's Data Grid Cache technology works and its benefits:

Overview of Data Grid Cache Technology

The Data Grid Cache is a distributed caching system that stores data in memory across multiple nodes in a network. This approach allows for faster data access compared to traditional disk-based storage systems. By distributing the data across various nodes, InetSoft ensures that the system can handle large datasets and high query loads efficiently.

Key Components and Functionality

1. Distributed Architecture: The core of InetSoft's Data Grid Cache technology is its distributed architecture. Data is divided into smaller chunks and stored across multiple nodes in the network. This distribution ensures that no single node becomes a bottleneck, improving the overall performance and scalability of the system.

2. In-Memory Storage: Data is stored in memory rather than on disk, which significantly reduces data retrieval times. In-memory storage is particularly beneficial for BI applications that require real-time or near-real-time data access.

3. Data Replication: To ensure high availability and fault tolerance, the Data Grid Cache replicates data across multiple nodes. If one node fails, the system can quickly access the data from another node, minimizing downtime and data loss.

4. Load Balancing: InetSoft's Data Grid Cache includes load balancing mechanisms to evenly distribute query loads across all nodes. This balancing ensures that no single node is overwhelmed, leading to more consistent performance and better resource utilization.

5. Dynamic Scalability: The system can dynamically scale by adding or removing nodes based on the current demand. This scalability allows businesses to handle varying workloads without compromising performance.

Benefits of Data Grid Cache Technology

1. Enhanced Performance: By storing data in memory and distributing it across multiple nodes, InetSoft's Data Grid Cache significantly reduces data access times. This enhancement is crucial for BI applications that require quick response times to support decision-making processes.

2. Improved Scalability: The distributed nature of the Data Grid Cache allows it to handle large volumes of data and high query loads efficiently. As data grows or query demand increases, additional nodes can be added to the network to maintain performance levels.

3. High Availability and Reliability: Data replication and fault tolerance mechanisms ensure that the system remains operational even in the event of node failures. This high availability is essential for businesses that rely on continuous access to their BI tools and data.

4. Cost Efficiency: By optimizing data storage and retrieval processes, the Data Grid Cache reduces the need for expensive, high-performance hardware. Additionally, the ability to scale dynamically means that businesses only need to invest in additional resources as needed.

5. Simplified Data Management: The caching technology abstracts much of the complexity involved in data storage and retrieval, making it easier for businesses to manage their data infrastructure. This simplification allows IT teams to focus on other critical tasks.

Real-World Applications

1. Real-Time Analytics: InetSoft's Data Grid Cache is particularly useful for real-time analytics applications. By providing quick access to live data, businesses can monitor key performance indicators (KPIs) and make timely decisions.

2. Interactive Dashboards: The technology supports the creation of highly responsive and interactive dashboards. Users can explore data, drill down into details, and generate insights without experiencing delays.

3. Data Mashups: InetSoft's Data Grid Cache enables efficient data mashups by quickly retrieving and combining data from multiple sources. This capability allows businesses to gain comprehensive insights from diverse datasets.

Previous: 10 Biggest Big Data Trends Next: Reducing Latency for Interactive Analysis