Simple Linear Regression Models
Hongwei Zhang http://www.cs.wayne.edu/~hzhang Statistics is the art of lying by means of figures.
--- Dr. Wilhelm Stekhel
Acknowledgement: this lecture is partially based on the slides of Dr. Raj Jain.
Simple linear regression models
Response Variable: Estimated variable
Predictor Variables: Variables used to predict the response
Also called predictors or factors
Regression Model: Predict a response for a given set of predictor variables
Linear Regression Models: Response is a linear function of predictors Simple Linear Regression Models: Only one predictor
Outline
Definition of a Good Model
Estimation of Model parameters
Allocation of Variation
Standard deviation of Errors
Confidence Intervals for Regression Parameters
Confidence Intervals for Predictions
Visual Tests for verifying Regression Assumption
Outline
Definition of a Good Model
Estimation of Model parameters
Allocation of Variation
Standard deviation of Errors
Confidence Intervals for Regression Parameters
Confidence Intervals for Predictions
Visual Tests for verifying Regression Assumption
Definition of a good model?
Good models (contd.)
Regression models attempt to minimize the distance measured vertically between the observation point and the model line (or curve) The length of the line segment is called residual, modeling error, or simply error
The negative and positive errors should cancel out => Zero overall error Many lines will satisfy this criterion
Choose the line that minimizes the sum of squares of the errors
Good models (contd.)
Formally,
where,
is the predicted response when the predictor variable is
x. The parameter b0 and b1 are fixed regression parameters to be determined from the data.
Given n observation pairs {(x1, y1), …, (xn, yn)}, the estimated response for the i-th observation is:
The error is:
Good models (contd.)
The best linear model minimizes the sum of squared errors
(SSE):
subject to the constraint that the overall mean error is zero:
This is equivalent to the unconstrained minimization of the variance of errors (Exercise 14.1)
Outline
Definition of a Good Model
Estimation of Model parameters
Allocation of Variation
Standard deviation of Errors
Confidence Intervals for Regression Parameters
Confidence Intervals for Predictions
Visual Tests for verifying Regression Assumption
Estimation of model parameters
Regression parameters that give minimum error variance are:
where,
Example 14.1
Example (contd.)
Example (contd.)
Derivation of regression parameters?
Derivation (contd.)
Derivation (contd.)
Least Squares Regression vs. Least Absolute
Deviations Regression?
Least Squares Regression
Least Absolute Deviations
Regression
Not very robust to outliers
Robust to outliers
Simple analytical solution
No analytical solving method
(have to use iterative computation-intensive method)
Stable solution
Unstable solution
Always one unique solution
Possibly multiple solutions
The unstable property of the method of least absolute deviations means that, for any small horizontal adjustment of a data point, the regression line may jump a large amount. In contrast, the least squares solutions is stable in that, for any small horizontal adjustment of a data point, the regression line will always move only slightly, or continuously.
Outline
Definition of a Good Model
Estimation of Model parameters
Allocation of Variation
Standard deviation of Errors
Confidence Intervals for Regression Parameters
Confidence Intervals for Predictions
Visual Tests for verifying Regression Assumption
Allocation of variation
Allocation of variation (contd.)
The sum of squared errors without regression would be:
This is called total sum of squares or (SST). It is a measure of
y's variability and is called variation of y. SST can be computed as follows:
Where, SSY is the sum of squares of y (or Σy2). SS0 is the sum of squares of
and is equal to
Allocation of