The other dimension is structure from mono scheme or highly structured to lightly structured or unstructured. So if you think about that 2/2 matrix, each quadrant has its own set of technologies. Hadoop, of course, is more in the high latency unstructured space. For instance, in the low latency, structured space there are things like in-memory technology which are being used for analytics to generate more real time question and answer type of things. That’s also a piece of it.
To be clear, when I say unstructured I mean things like text feeds from social media and by structured I mean like relational databases. That’s one of the things that people associate things like Hadoop is unstructured. And the truth of it is what we see is Hadoop as a distributed file system so you put unstructured files in it, but most of those files have some structure in them like a web log.
So even though it's a file and called unstructured, it's not like we see people using Hadoop for email or free text processing. It's more around files that have some structure that allows for parsing of that structure and doing analytics on those in a very scalable way.
Where is this extreme frontier now if Big Data is on the extreme? You would expect examples in life science with string DNA data. And then there is telecom where you are storing call data records in the billions. You hear people using distributed so first off when we talk about Big Data technology, the way I have been explaining it to clients is, generally speaking, it's massively parallel processing for huge workloads. It offers you a flexible analytic model, which includes late schema binding or no schema, schema less kind of things. You don’t have to have just one schema.
Generally speaking Big Data technology means massively parallel Some of the columnar databases give you the capability to late bind a schema to a particular data that you have captured in a columnar format. So that gives you more flexibility. And then the last thing is, is they tend to be linearly scalable so you can buy as much as you need ,and then when your data needs to grow you can buy some more.
I mean there is no perfect, one technology. When you start to see pieces parts of these things, and there is a business case for investing in those technologies, then you have you got a Big Data problem. Typically we see folks using Big Data technologies for petascale in terms of a volume, but there are high velocity cases, too. For instance, a hospital connected all the medical equipment that was monitoring premature babies. I think it was 96 million data points a day was what they were collecting.
And they use a streaming technology to essentially only persist that data for a very short window, and then you run it through a filter, and that filter does the intelligence. So they weren’t even really storing it or using Hadoop, but the streaming technology they were using was massively parallel, and the flexible analytic model very scalable.
That’s a Big Data use case, and it didn’t have anything to do with data storage. It was more high velocity number of data points per a period of time. When you get up into the hundreds of millions of data points within a couple of hours or a day, we see the Big Data technologies beginning to take over, but there is no one dividing line.