There is also something called Pig which is a data transformation language. There’s something called Mahout, M-A-H-O-U-T, for doing predictive analytics with Hadoop, and there is a few other pieces as well and Microsoft’s distribution has all of those pieces in it. So those are all open source projects. You can tell because they have silly names. That’s usually a clue that you’re in the world of open sources, and Microsoft implemented all of those.
Now it’s still in private beta, but I expect it to be generally released very soon and the private beta you can request an invitation. You can just go hadooponazure.com and request an invite. I think you can take up to a week before there is a response, but you can get in. And that’s the set of open source tools available to you if you understand the Windows world, but if you don’t, there is Hadoop implementation also included on Amazon web services, and of course, you could install it on your own cluster of servers.
In terms of making the Big Data, as it were, you’re already going to have it. If you’re going to be doing Big Data analysis, those data sources are going to be obvious, too. You may be working on log files in a web context.
You may be working on sensor data in a supply chain or manufacturing context. You shouldn’t have to make the data. The whole issue typically is that something is already making the data that you want to study.