This post is not just for anyone just starting to learn about the business intelligence solution space. I found a helpful article at Software Advice that offers definitions to one buzzword and two older, fundamental BI terms, and I thought I would add my 2 cents and provide context for those terms in InetSoft’s business intelligence technology.
Big Data – this is the buzzword on the list. I wonder how long it will be used since today’s ‘big data’ will be ‘small data’ in a few years! A couple of the definitions provided call it an analytical process like data mining. I suppose that is what you do with it, but for my definition, I’ll call it any data set that is more than a terabyte big. Here in 2012 that seems to be the size of a database that is subjectively considered large. But it’s a relative concept. By the end of the year, it’ll have to be 10TB to be considered big.
All kinds of transactional databases are candidates for creating Big Data, some in a matter of weeks of recording. Tracking clicks and interactions on Web sites, telecom activity detail records, and social media communication and actions are the best examples. InetSoft’s BI application has been designed with Big Data in mind and employs a unique approach to managing performance that we call ‘Data Grid Cache’. It’s a hybrid in-memory and disk-based caching solution that we think is more flexible and easily deployed.
Data Warehouses – while not a buzzword, this one is interesting because it also might have less meaning in the years to come. All of the definitions provided in the Software Advice article are fine. But here’s what I mean. Decades ago the approach of making copies of data and calculating aggregations or other metrics users would want to frequently use was necessary because CPU power and costs dictated that operational systems not be taxed by analytical functions.
Nowadays operational data stores get so large, people question the idea of duplicating them into data warehouses, and technology has progressed where analysis can be done more directly on the live data across multiple data sources. In addition, user expectations, both from internal enterprises users as well as external customer users, have been raised where real-time or near real-time access to information is expected. So again a data warehouse doesn’t make sense.
InetSoft has a unique approach offering two modes of data access: direct, real-time and near-real-time, caching. And in contrast to a data warehouse, the caching solution is optimized just for the data needed for a particular dashboard or analysis.
Data Mining – this is the last one. There is not much controversy about the definition here, and I am sure it’s an activity that will persist forever. It’s the act of looking for trends, relationships, and outliers in data. The idea developed in parallel with data warehouse reporting tools since that was the place to do it. Data mining is particularly interesting in the context of big data since the most valuable results come from discovering findings from millions or trillions of records, something only possible with the help of powerful computing power.
InetSoft’s take on it is that in addition to being something that specialized analysts or statisticians do, today’s visualization technology allows less sophisticated users to do visual data mining. And it’s so easy, it’s a matter of dragging data fields onto x and y axes, plus multi-dimensional attributes such as coloring and sizing, and in minutes patterns, clusters, and outliers are visible on the screen.