Part I. Completing existing ETL labs
a. First Integration Service Project
After the table Product was created, I first added a Source Assistant to add a flat file source and connect it to the Product.csv.
Then I added a Destination Assistant to add an OLE DB source connecting to the Product table and mapped the input column with the destination column.
According to the result, 99 rows were read from the source.
b. Control Flow Basics
1. Add a File System task to the control flow of the package and edit it as follows:
2. Connect the Data Flow task to the File System task.
3. Create the connection manager named FailureSource in the connection manager pane and create the connection manager named FailureDestination in the connection manager pane for the failure semaphone destination file.
4. Add a new File System task and edit it as follows:
5. Connect the new File System task to the Data Flow task. Then right click the constraint and change the constraint type to Failure.
c. Data Flow Basics
1. Open the data flow and add conditional split transformation. Connect it to the upstream data source. Configure it to identify expensive products as follows:
2. Use the existing destination adapter for the normal products table. Then copy and paste it and name it as “Expensive Products Table”, which experts to table products_expensive.
3. Add a flat destination adapter to connect the error output of the conditional spit. Configure the destination for the error rows and review the field mappings.
4. Run the data flow and the results are showed as follows:
After the conditional split transformation, there are 48 rows in the normal product table and 51 rows in the expensive product table.
d. ETL with SSIS: A Complete Example
1. Create tables for dimensions and fact
2. Load time dimension
2.1 Add two connection manager for the two database OrderProcessingSystem and my own database.
2.2 Add a Sequence Container called load dimension sequence container and add a data flow task called load time dimension.
2.3 Open the data flow task, add Source assistant and connect it to an excel table “MaxMinSalesTimeTableRevised.xlsx”. Add a Destination Assistant and connect it to SQL server with my own database. Then connect them.
2.4 Run the load time dimension task and get the results:
There are 1095 rows loaded successfully.
3. Load Customer Dimension
3.1 Add a new data flow task to the sequence container named Load Customer Dimension and connect it to the previous task.
3.2 Open the new data flow task and add a source assistant with my own database and choose the table dim_customer
3.3 Add Derived Column transformation to combine the customers’ first and last names.
3.4 Add Destination assistant with my own database and choose the table dim_customer.
3.5 Add Data Conversion to convert column HomeOwnerFlag and Married into unicode string and connect it to the destination. Correctly map the input and destination columns.
3.6 Run the Load Customer Dimension task and get the results:
There are 10,000 rows loaded successfully.
4. Load Product Dimension, Promotion Dimension and Store Dimension
4.1 Add a new data flow task named Load Product Dimension and connect it to the previous task.
4.2 Open the new data flow task. Add a source assistant and connect it to the OrderProcessingSystem and choose the product table.
4.3 Add a destination assistant and connect it to my own database and choose the dim_product table. Connect it to the previous source. Map the input and output columns correctly.
4.4 Run the task and get results:
There are 56 rows loaded successfully.
4.5 Use the similar methods to load the promotion and store dimension. Results are shown as follows:
There are 10 rows loaded successfully for the promotion dimension.
There are 7 rows loaded successfully for the store dimension.
5. Load Sales_person Dimension
5.1 Add a new data flow task named Load Sales person Dim. Connect it to the previous task.
5.2 Open