Modeling
ISYS637
Vocabulary
Fact Table
Dimension Table
Star Schema
Snowflake Schema
3/4/15
ISYS 637: DATA WAREHOUSE
2
Figure: Generating Reports from facts and dimensional attributes
3/4/15
ISYS 637: DATA WAREHOUSE
3
Dimensional Design Process
(Source: “The Data Warehouse Toolkit” by Kimball and Ross, 2013)
1. Select the business process to model
◦ E.g. a dimensional model to capture order process
2. Declare the grain of the business process
◦ Specify exactly what an individual row in a fact table represents.
3. Choose the dimensions that apply to each fact table row
◦ “How do businesspeople describe the data that results from the business process?” 4. Identify the measures (or numeric facts) that will populate each fact table row
◦ “What are we measuring?”
3/4/15
ISYS 637: DATA WAREHOUSE
4
Where to Start?
1. Business Process
1. EVENT
2. DATA (or Dimensions) relevant to the EVENT
1. People
2. Product/Service
3. Location
4. Date/Time
5. Organization(s)
6. Document(s)
7. Measures
3/4/15
ISYS 637: DATA WAREHOUSE
5
Data Warehouse Bus Matrix
Sharing dimensions across value chain
3/4/15
ISYS 637: DATA WAREHOUSE
6
Data Warehouse Bus Matrix
Sample data warehouse bus matrix 3/4/15
ISYS 637: DATA WAREHOUSE
7
Data Warehouse Bus Matrix
• The rows of the bus matrix correspond to data marts.
• You should create separate matrix rows if the sources are different, the processes are different, or if the matrix row represents more than what can reasonably be tackled in a single implementation iteration.
• Creating the data warehouse bus matrix is one of the most important up-front deliverables of a data warehouse implementation.
• It is a hybrid resource that is part technical design tool, part project management tool, and part communication tool.
3/4/15
ISYS 637: DATA WAREHOUSE
8
Data Warehouse Bus Matrix
Data
Marts
Sample data warehouse bus matrix 3/4/15
ISYS 637: DATA WAREHOUSE
9
Measures (or Facts)
3/4/15
ISYS 637: DATA WAREHOUSE
10
FACTS- Measures
Measure- Represents the property of the fact that we want to analyze
Measures can be◦ Atomic- Generated directly from the source data and stored in the data warehouse, e.g. product price.
◦ Derived- Generated on the fly through pre-calculations on one or more measures in the fact, e.g., “total price”, that is generated according to the formula {unit_price * quantity}
◦ Additive- A measure is additive along a dimension if we can use the SUM operator to aggregate attribute values along all hierarchies defined on that dimension
◦ Semi-additive- Additive across some dimensions but not all, e.g. Inventory is not additive on time dimension but additive on product and location dimensions
◦ All measures that record a static level (inventory levels, financial account balances, and measures of intensity such as room temperatures) are inherently non-additive across the date dimension and possibly other dimensions.
◦ In these cases, the measure may be aggregated usefully across time, for example, by averaging over the number of time periods.
◦ Non-additive- non-additive across any dimension, e.g. ratios such as profit margins
◦ If possible, store the fully additive components of the non-additive measure and sum these components into the final answer set before calculating the final non-additive fact. This final calculation is often done in the BI layer or OLAP cube
3/4/15
ISYS 637: DATA WAREHOUSE
11
Example: Additive, Semi-Additive, and Non-Additive Facts
Current_Balance is semi-additive because it is additive for Account, but nonadditive for Date
Sales_Amount is additive for Date,
Store and Product
3/4/15
Profit_Margin is nonadditive for Account, and Date levels
ISYS 637: DATA WAREHOUSE
12
Factless Facts
Facts that do not contain any measures, i.e. have no measures associated to them. This type of facts is usually used to represent the state of something, where there are no measures required to be stored on this fact
These are facts