The simple regression model can be used to study the relationship between two variables.
- If variables are related they are correlated
- Economic theory informative about sign or range, but uninformative about magnitude
- Regression analysis allows the quantification of relationships posited by economic theory
Y = Dependent variable, the explained variable, the response variable, the predicted variable or the regressand
X = Independent variable, the explanatory variable, the control variable, the predictor variable or the regressor
Bo, B1 = underlying population (true) parameters u = disturbance (error) term – it represents factors affecting y other than x that are being unobserved.
If u is equal to 0 (so all other factors affecting x are held constant) then B1 is simply the slope of the regression model and Bo is the intercept parameter (the constant term).
E.g
The
The Disturbance Term
Disturbance term plays an important role in econometrics
- It accounts for all the factors that affect y that we have no explicitly accounted for
- Although we may wish to obtain an estimate of the ceteris paribus effect of x on y (B1), how we can do so if we are just ignoring them (by relegating then to u?)
- We can only legitimately estimate the intercept Bo and B1 if we make some assumptions about the relationships between what we are interested in (parameters) and what we are not interested in (disturbance)
We are only able to get a relaibale estimator of Bo and B1 from arandom sample of data when we make an assumption restricting how the unobservable u is related to the explanatory variable x. without such a restriction, we will not be able to estimate the ceteris paribus effect B1. Because u and x are random variable, we need a concept grounded in probability.
One assumption about u
- as long as the intercept B0 is included in the equation, nothing is lost by assuming the average value of u in the population is zero E(u)=0
- The assumption says nothing about the relationship between x and u, but simply makes a statement about the distribution of the unobservables in the population
- The assumption is not very restrictive.
A natural measure of the association between two random variables is the correlation coefficient.
- If u and x are uncorrelated, them as random variables, they are not linearly related.
- Correlation only measures linear relationships between u and x
- Correlation has a counterintuitive feature; it is possible for u to be uncorrelated with x, while being correlated with functions of x (such as x squared)
- This possibility is not acceptable for most regression purposes, as it causes problems for interpreting the model and fro deriving statistical properties
- A better assumption involves the expected values of u, given x
U and X are random variables, so we define the conditional distribution of u given any value of x. in particular, for any x, we can obtain the expected (or average) value of u for the slice of the population described by the value of x.
The crucial assumption is that the average value of u does not depend on the value of x.
- E(u|x) = E(u)
- The equation says that the average value of the unobservables is the same across all slices of the population determined by the value of x and the common avaerage is necessarily equal to the average of u over the entire population.
- When this assumption holds, we say that u is the mean independent of x.
- When we combine mean indepdent with the assumption E(u) = 0, we obtain the zero conditional mean assumption
- E(u|x) = 0
Key assumptions
On average the disturbance term is zero
- E(u) = 0
- Although the combined effect of the unobservable factors is sometimes postive, it can also be negative
- If the intercept included, E(u) is always normalized to 0
The disturbances are unrelated to the explanatory