How does the data mashup platform handle performance and mixed workloads, large datasets, query concurrency, and latency differences?
This kind of Web data automation has really not only reduced the back-office waiting time and operational efficiency, but is also providing capabilities to the enterprise to offer something new to those customers. The same idea extends to SaaS applications and Web and cloud applications. Let’s now answer the most common questions.
Performance is not a simple thing. Performance depends on the context. Are we talking big data sets? Are we talking about lots of people accessing the source same time with different latencies, etc? Now I am going to quickly address a few of these issues.
The other big area of questions is security. Let me just hit on performance really quick. There are a lot of strategies to address performance, and we have written a performance whitepaper that goes into this in greater detail, but optimizing queries, using a combination of caching and column-based technologies, and a scheduler to balance latencies. All of these are employed in what we call data grid cache. It also does things like automatically choosing the right join methods to best implement the query.
#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index |
|
Read More |
And you get a lot of information through traces, etc. for you to manually override or tweak the automated query performance optimizations. And I used the word query lightly here. It's not meant to be just an SQL query. It could be an XML query. It could be different ways of accessing the information. The basic concept that’s important though is that we are not limited to virtual real-time access. You can also leverage caching of different types as well as schedulers and combine them in different strategies to be a trigger based cache or a time-based cache. And of course, caching is introducing some storage again, but that’s definitely much less than what you would otherwise do.
The other issue is security access rights. We implement data security at three levels, at the source level, and then at the data model level where you have the virtual canonical models. You can control who can change them which is very important for data governance. And then you have security at the access level, where the users are actually accessing information. That needs to integrate with LDAP, single sign on etc.
So going quickly to picture what you will see at the source level, there are ways to have single sign-on, module level communications that can all be encrypted. At the view level, you can provide authentication and granular access control. You can do column and row based masking. You can say certain people can view the whole thing, certain people can do drill down, certain people can only see summary data, but not the detail data, and all of that can be integrated with roles, groups, users and LDAP or active directory.
|
“Flexible product with great training and support. The product has been very useful for quickly creating dashboards and data views. Support and training has always been available to us and quick to respond.
- George R, Information Technology Specialist at Sonepar USA
|
Separately from the governance side, the metadata and the views and who can change the views and modify the views can also be governed using security. So I generally refer to that as source level security, model level security, and access control and granular authentication level of security. Again very much thought through has been implemented in many places. Data quality, reliability, metadata have all been considered.
So in summary, data virtualization and data mashup, as you have seen through examples, are really technologies that can work together to solve both informational application problems and transactional application problems. The value that they provide is better quality information to the business. They allow you to quickly integrate discrete data silos, access information that might be previously untapped that has a lot of latent value and then provide all of that combined information in an unified access mode that is close to real time.