We’re going to be talking about the newest buzz word in BI: Big Data. You know we all have been through Big Data, and we probably all have a certain idea what it is. You can you define it in your own world and define it as how you see it.
The reason that a lot of people don’t know what it is because honestly there are different definitions out there. People have talked about one thing, and they believe they’re on the same page. But I saw a survey of small and medium businesses who were asked about Big Data, and it turned out there were about three or four different definitions that were prevalent out there.
The way I like to define it as a kind of baseline is that Big Data is the science and the practice of working with data that in some way, shape or form is just too big for traditional transactional databases to work with inefficient way. Now, you know, people will go beyond that and name an amount, you know, into the hundreds of terabytes, or into the petabytes ranges is one definition.
That’s definition is going to be fluid, right? You know two years from now that the size will probably be increased because even transactional databases will be able to take on those volumes. So it’s always going to move, but I would say you know if it’s too big for a transactional reporting system then it’s Big Data.
#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index |
|
Read More |
And if it’s coming in fast and furious through some kind of streaming data source that’s a good sign. You know if you’re dealing with things that are not stored in relational data sources, maybe they are log files. Maybe it’s data coming from sensors. That kind of thing is a good omen that it’s Big Data as well. And you know if that sounds a little inconsistent, it’s because it is. There are really different uses in this whole field.
Another Big Data scenario is one that incorporates multiple data sources. If you’re familiar with the concept of data warehousing, if you’ve done some work in the field of BI, you know BI and Big Data actually have a tie in. Think about pulling data from all kinds of systems, but just think about some of those systems and reporting tools not being relational databases but being something a little less traditional than that.
We’ll probably talk about BI a little bit later in the Webinar, but let’s start out with what’s started the whole Big Data craze. What product came out that tipped the scale? The underlying causes for Big Data’s popularity and feasibility honestly are that processing and storage have become far cheaper in the last several years than they ever were.
We know that things always tend to get cheaper and more sophisticated in technology. That’s not new. But it’s really gotten to the point now where a lot of the data that we used to throw away because it just wasn’t practical to keep, it is now easily kept. Storage is cheap enough where you can pretty much keep everything, and if you can keep everything then you have a lot more data and detail that you can analyze.
It turns out that analysis is quite valuable in lots of different settings. Also what happened was Google was working on some technology which by the way they kept to themselves, but they did insure the technology they did share was the underlying thoughts and engineering underneath it.
They had something called map reduce because what they were doing, and still are doing, is crawling the Web. The huge amounts of data that that involves, relational databases just didn't cut it for them. So they created something called map reduce. They created their own file system, and the long story short, there is these open source project called Hadoop. H-A-D-O-O-P. Hadoop is, in fact, the open sources implementation of Google’s map reduce and the Google file system, although in Hadoop, it’s known as the Hadoop distributed file system.