• Verifying the data within the transforming stage (making sure that the workflow is correct by testing and evaluating)
• Execute the transforming stage.
• Replacing the clean data with dirty data from original sources in order to improve data consistency and avoiding re-cleansing data for future extraction.
2.2.3 Data Derivation
Data derivation is the process of creating a data value from many contributing data values through derivation algorithm. Data derivation is very vital because of the importance of accurate data within the data warehouse. Understanding data derivation is essential as it’s the process of making more meaningful data from aggregations and can minimize sizes of data sets. Each formula should always be carefully planned and documented so that operations will run efficiently.
An example of data derivation is calculating a person’s age, when the record only stores the birthday date. Below is a formula that derives age from the birthday:
“
Person_age = floor ((randdate – DOB) / 365.23)
”
This is only one way to derive the data and can be done in other ways, for example taking into account the …show more content…
They are used in dimensional models of the data warehouse to query large sets of data. Without using aggregations, a data warehouse is unusable as a production data store and can cause performance issues. A simple query on a fact table can potentially return millions of rows and is therefore more practical to apply an aggregate, for example using SUM/MAX/AVG. Doing this can reduce ad hoc query performance on low level data as intensive calculations are required. Therefore low level data from a fact table should be summarized and stored in advance within an aggregate table. The simplest form of aggregate is using the GROUP BY SQL Query