Mark Flaherty (MF):The effect here is that, because we haven’t developed our environment 30 years earlier in anticipation of many different ways that the business was going to grow and change, we ended up with this complexity and variation. But once we have an understanding that those things can exist, we want to reduce the risk of doing the same things over and over again - replicated functionality, replicated work, rework by understanding where those differences are.
We can start to migrate towards a more standard environment so we can assess the variances. We can look at what types of data standards we can use for bubbling up that chain of definition from the business term to the data element concepts to the uses of the data elements, the conceptual domains and the value domains.
Essentially, we evolve a rationalized set of canonical data models, and when we see that that our applications could, without a significant amount of effort or intrusion, be able to migrate into these canonical models, we can merge our representations into the standardized format.
#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index |
|
Read More |
And then on the other hand, if we see that we can’t merge the data sets, then what we can do is differentiate them. If it turns out that we are actually talking about the difference between the prospective customers with a customer support contract or agreement for 30 days as opposed to customers who actually have given us money, those two groups of people are actually different sets of people, and therefore we might want to differentiate even at the lowest level in that chain of definition.
We settle on the fact that we have got different business terms that need to be put into place, a prospective customer, or an evaluation customer. We will have different ways of qualifying our terminology so that we can employ our different standard representations.
In order to do this, we need to have some processes, and we need to have some data management tools. What are the types of tools do we need? Well we need data modeling tools because we need to have some mechanism for capturing our rationalized and standardized models so that we can even then use them as a way of communicating what we are working on and what things look like and how data can be shared.
We need data profiling tools and statistical analysis tools and model evaluation tools and techniques so that we can do our analysis as part of our rationalization activity. And we need a metadata repository that becomes the central platform for capturing the knowledge and communicating what we have learned, not just the technical people, but really to everybody in the organization.
A data profiling tool is a software application designed to analyze and assess the quality and characteristics of data within a database or data warehouse. These tools automatically examine the structure, content, and relationships within the data to identify patterns, anomalies, inconsistencies, and errors. By generating detailed metadata and statistical summaries, data profiling tools provide valuable insights into the integrity, completeness, and accuracy of the data, helping organizations understand the scope and complexity of their data assets. Additionally, data profiling tools often offer functionalities such as data quality assessment, data classification, and data lineage analysis, enabling users to make informed decisions about data management, cleansing, and integration initiatives. Overall, data profiling tools play a critical role in ensuring the reliability and usability of data for business intelligence, analytics, and decision-making purposes.
A statistical analysis tool serves as a cornerstone in ensuring data quality within data management practices. By leveraging various statistical techniques, such as descriptive statistics, outlier detection, and hypothesis testing, these tools help organizations assess and improve the integrity, accuracy, and completeness of their data. Statistical analysis tools can identify anomalies, inconsistencies, and errors within datasets, allowing data stewards to pinpoint areas of concern and take corrective actions. Additionally, these tools enable organizations to establish data quality benchmarks and monitor data quality over time through metrics and key performance indicators (KPIs). By continuously analyzing and validating data using statistical methods, organizations can proactively identify data quality issues and implement strategies to enhance data accuracy, reliability, and relevance. Ultimately, statistical analysis tools play a crucial role in maintaining high data quality standards, which are essential for effective decision-making, regulatory compliance, and organizational success.
A data model evaluation tool is a software application designed to assess the effectiveness, efficiency, and quality of data models used within an organization. These tools analyze various aspects of data models, including their structure, relationships, integrity constraints, and adherence to best practices and standards. Data model evaluation tools provide capabilities such as schema validation, consistency checks, and performance profiling to identify potential design flaws, redundancies, and optimization opportunities. Additionally, these tools may offer functionalities for comparing different versions of data models, documenting metadata, and generating reports to facilitate collaboration and decision-making among stakeholders. By evaluating data models against predefined criteria and industry standards, these tools help organizations ensure that their data architecture aligns with business requirements, promotes data integrity, and supports efficient data management and analysis processes.
And if I had to make one parenthetical comment associated with the content that I am presenting today, I would boil it all down to this: that the use of a data model is no longer limited to a DBA or database administrator or a data modeler, but rather the use of a data modeler is a framework for communicating the value of that information asset, or we were talking about before, about using data as an asset or treating data as an asset.
The way to do that is to manage our assets in a structured manner. We keep track of our assets. We look at how they are being used to run the business and to help the business. We do that for any of our real assets, our hard assets, and we should do that for information assets if we are going to treat them that way.