Sri Krishnamurthy, CFA, CAP
Hult.quant.fenway@gmail.com
Hult.quant.backbay@gmail.com
QuantUniversity LLC.
Adjunct Lecturer
Hult International Business School
Session 2 - Part 1
Copyright 2014 QuantUniversity LLC. Cannot be reproduced or used without written permission from
QuantUniversity LLC.
Agenda
Regression Analysis : Estimating Statistical
Relationships
In-class Exercise
Regression
Topics
Regression Terminology
Scatterplots
Correlation
Simple Linear Regression
Multiple Regression
Modeling relationships
Where can you use regression?
Can you predict starting salaries for graduating students ?
Salary = function of { Previous years of education, Experience,
GPA, Interview Performance etc.}
Can you predict speed of a car ?
Speed = Function of { Engine parameters, Car age, Tire age,
Car Model}
Does spending more on advertising increase sales ?
Can you predict the energy prices/temperatures for the
next year?
What is regression ?
Regression analysis is the study of relationships between variables.
Prediction
Unknown variable = function of finite known variables
Type of Data:
Cross sectional data
For example : at a point in time, sales = function of {no of
promotional TV ads}
Time series
For example : Temperature tomorrow = function of {temperature 1
year ago, temperature 2 years ago etc}
Regression terms
Dependent variable / Target Variable /
Response variable (Y)
The unknown variable we are trying to explain/predict Independent variable / Explanatory variable / Predictor variable (X)
The variables used to predict the response variables Type of Relationship
Type of Relationship:
Linear Or non Linear
Types
Simple Linear Regression
1 dependent variable , 1 independent variable
Ex : Sales = function of { ad spend}
Multiple Linear Regression
1 dependent variable , many independent variable
Ex : Sales = function of { ad spend, no of promotional events,
number of sales offices}
Scatter Plots
Why Scatterplots ?
A scatterplot is a graphical plot of two numerical variables,
an X and a Y.
If there is any relationship between the two variables, it is usually apparent from the scatterplot.
Drugstore Sales.xlsx
To use a scatterplot to examine the relationship between
promotional expenditures and sales at Pharmex.
Observations:
Drugstore Sales.xlsx
The scatterplot indicates that there is a positive relationship between Promote and Sales.
The relationship is not perfect. While the variable Promote is helpful in predicting Sales, it does not lead to perfect predictions. Correlation and Causation
Correlation between the variables does not imply
causation.
Scatterplot only tells if there is a relationship between the two plotted variables.
To analyze multiple variables
We can use scatterplots to examine the relationships among
the dataset variables.
Examine scatterplots between each explanatory variable and the dependent variable.
With multiple explanatory variables, check for relationships among them.
Overhead Costs.xlsx
Data file contains observations of overhead costs, machine
hours, and production runs at Bendrix, an automobile parts manufacturing company.
To check for Linear and Non-Linear relationships Scatterplots are useful for detecting relationships that may not
be obvious otherwise.
Some relationships may not be linear – when points do not cluster around a straight line.
Scatterplot below indicates a nonlinear relationship between life expectancy of newborns in 1990 and GNP per capita.
To check for outliers
Scatterplots are especially useful for identifying outliers –
observations that fall outside of the general pattern of the rest of the observations.
Example below shows an outlier on a scatterplot of the relationship between CEO salary and years of experience.
How to address outliers?
Bad data/not relevant to the analysis : Omit
If not clear, run