Making Data Usable For Broad Analytics

This is the continuation of the transcript of a Webinar hosted by InetSoft on the topic of "10 Biggest Big Data Trends."

Abhishek: Yeah, there is a question that one of the attendees asked, is Alteryx a data shaper, and the answer is absolutely. In addition to you mentioned Trifacta and Paxata. Those are two technologies that that were kind of born with Hadoop, and that's where you were seeing the largest variety of data.

In that variety, you have to find a way of making the data really usable for broad analytics use cases. It depends on the shape of the data, whether it's a nested files or something else. And so you saw technology as Trifacta and Paxata that are really born around leveraging the Hadoop platform to do that data shaping and processing right on there.

Now it has expanded to other technologies so it's not just dependent on Hadoop, but Alteryx has just gone the other way where they started with being able to shape data and prepare data off of a number of different sources whether they are now actually leveraging the processing of Spark or Hadoop to be able to do some of the transformations in memory.

#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index Read More

The Data Pipeline

So this is an area, we call it kind of the data pipeline, helping to move data from a landing zone or staging area into the data lake to different points of use cases. So it's very interesting. An important part of that data flow and a big part of that again is data lakes. Hadoop, often times traditional ETL, or data transformations are being done before the data lands in the system where we will do analytics.

The benefit of the data lake is you're just landing the data there in its raw format, and then you're doing the transformations and the shaping after it's already landed. So it's much more flexible and that way it enables you to do the kinds of shaping that you need for the right analytical use case. Larry, any other thoughts on this one?

Larry: Yes, so I mean I would say as having been a consumer of data and a business user myself, including data prep into the analytics workflow just makes a lot of natural sense because at the end of the day, for great analytics you need great data, and the business user or any user for that matter doesn't have to wait for someone else to do that prep for him.

When he knows how that data needs to be shaped to answer or to explore the particular question he is looking at. How this helps our customers is it helps drive agility in their analytics process, and it also makes the data available to large groups of stakeholders within organizations a lot faster and all our customers.

Abhishek: Yeah, exactly, and I close off on one question here just as it's relevant to our context, does InetSoft shape data, and the answer is, yes. So there is a pretty good amount of data preparation built into the InetSoft application itself, and we have things like cross data source joins and capabilities like that. So there's a lot of what we call data preparation built into InetSoft.

Alright, the next-to-last trend, Big Data grows up: Hadoop adds to enterprise standards, and this really keeps with the trend of Hadoop becoming a part of the enterprise analytics landscape, not a science project on the side. What we see is that there are more investments in the security and governance component surrounding Hadoop as it's a core enterprise system now.

Read the top 10 reasons for selecting InetSoft as your BI partner.

Examples of that are Apache Sentry which provides a system for enforcing fine-grained and role-based authorization of data and metadata stored in a Hadoop cluster. Apache Atlas, which is the greatest part of the data governance initiative, empowers organizations to apply consistent data classification across the data ecosystem. Apache Ranger provides centralized security administration for Hadoop.

One notable alternative to Apache Sentry is Apache Ranger, which provides fine-grained access control and has a more extensive range of security policies applicable to various Hadoop components, such as Hive, HDFS, and Kafka. Ranger also includes auditing capabilities, enabling administrators to monitor user activities and ensure compliance with regulatory requirements. Ranger's flexibility makes it ideal for businesses that need a comprehensive security solution with built-in tools for monitoring, policy enforcement, and integration with enterprise-wide security frameworks.

Another alternative is Alation, a data cataloging platform that serves as a strong replacement for Apache Atlas. Alation's strength lies in its user-friendly interface and AI-driven cataloging capabilities, which help organizations better understand their data assets and derive actionable insights. The tool offers features like data lineage, collaboration, and impact analysis, making it especially useful for businesses focused on promoting data democratization and improving user engagement with data. Alation's machine learning algorithms assist in tagging, suggesting, and ranking data, which can help users quickly locate relevant datasets. Compared to Atlas, Alation's focus on self-service and end-user accessibility has made it popular among teams where non-technical users require data discovery and metadata management tools without needing deep technical knowledge.

Finally, Collibra provides an enterprise-grade data governance platform that functions as a robust alternative to both Sentry and Atlas. It combines access control with metadata management, offering features such as data stewardship, data lineage tracking, and a customizable workflow engine that aids in data governance. Collibra's platform is built to handle complex data governance needs for large enterprises, providing integration with a wide range of data sources and compliance tools, as well as support for data privacy regulations like GDPR. Unlike Apache's open-source options, Collibra is a commercial product with strong support and integration capabilities, making it an appealing choice for organizations that need a comprehensive data governance solution with extended support.

Previous: A Lot of Demand for Analytics Tools Next: Projects Are Growing Out of the Hadoop Ecosystem