Normally you think of data warehouses as a single refrigerator-sized appliance, but you can connect them. They can be massively parallel together. Then you have distributed technologies. Hadoop is the most notable, but you also have NOSQL.
When it comes to other business intelligence technology, it really depends on the way that you are using them. If you are using them to be more affordable for extreme scale problems, they can be tools in your Big Data toolkit. Things like Cassandra and HBase offer you different tradeoffs between consistency, availability and partition tolerance.
That’s also another thing that we find in Big Data solutions is they tend to force you to make these kinds of tradeoffs because to be highly available typically you have to either give up consistency or tolerance to partitions. And so we see these tradeoffs being made so those kinds of databases like HBase or Cassandra that make those tradeoffs and are massively parallel tend to be used to handle extreme scale. Therefore they can be Big Data tools.
#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index |
|
Read More |
Amazon Elastic Cloud Compute is a Hadoop implementation within Amazon. Then there is Microsoft Azure, and Google has a new SQL engine that I think is very fast and allows big data but then you have the issue when you use the Cloud of the movement of the data back and forth because if you are moving petabytes across. It's not practical.
Data tends to have gravity where it collects a lot and wants to stay there. I have heard this other term associated with Hadoop the term, data landfill, or ha-dump. And I have also heard the Hadoop hangover.
It's an interesting emerging use case in which we see Hadoop being thought of as a staging environment on steroids, a place to stage and dump in a massive amount of stuff that you are not quite sure what you want to do with. So you stream it as a set of flat files into Hadoop. Let Hadoop deal with it where it goes, and then if you want to write MapReduce against that, have fun.
But in a lot of cases we are beginning to see other vendors create connectors to Hadoop to pull that Hadoop data into for instance like an Aster Data. So Teradata, Aster Data and Hadoop are this ecosystem that they have created that allow you to source it into Hadoop do more structured column or analytics in Aster Data take those insights pull them into Teradata and use your traditional data warehousing tools there.
Other examples are Hadoop and SAP HANA, and Hadoop and Greenplum. That’s a very common use case. And I think there is an important point because Big Data isn’t about just one of these technologies. Most of the large companies that have Big Data need a combination of business intelligence solutions. It's not like Hadoop is going to replace your data warehouse. You need all of these technologies.
I have talked to a lot of companies who are struggling with well if I do Hadoop, can I just get rid of my data warehouse, and the answer is probably not. There is all kinds of cleansing, conforming things that you still want to be able to do with the typical ETL processes. But there are an awful lot of use cases where you can just dump it into Hadoop or NoSQL database and run analytics against it a lot faster and cheaper. So it's just another analysis tool in your BI tool kit.
The next natural question about Big Data is all right I have got all this Big Data and I can handle it at the extreme scale. What the heck do I do with it, and that’s the analytical part of it. Sometimes that can be data mining also known as predictive analytics which means you actually have to have the ability to process this. And I hear a lot of people talk about well we have Hadoop or we need Hadoop. We need to set that up, but then well what are you going to do with that next. They haven't thought that through.
I get that all the time from our IT clients. What should we do? The advice I am giving is, don’t do anything until you have a notion of value that you and your business are partnering on. It's not something we think IT can get ahead of. In fact, the survey that we did last year on Big Data confirmed that about 75% of the responded said that their Big Data initiatives were joint IT business partnership where they are both in it to win or fail together.
That’s what you need to do is find that notion of value that your business has, and what IT brings to the table is perhaps there is a more economical way to deal with that extreme scale and to realize the value out of your idea. Let’s partner together and figure out how we can do it.