Linear regression with one predictor variable
1
Regression (Historically)
Regression means ‘going back’
Francis Galton (1822‐1911) studied “Hereditary Genius”
(1869) and other traits
Heights of fathers and sons
Sons of the tallest fathers tended to be taller than average, but shorter than their fathers
Sons of the shortest fathers tended to be shorter than average, but taller than their fathers
This kind of thing was observed for lots of traits.
Galton was deeply concerned about “regression to mediocrity.” 2
Types of Data
Typically, data come to us in one of four forms:
Categorical (Nominal)
Ordinal
Interval
Ratio
3
Categorical variables
Take on several levels, none of which have any natural ordering
Sex (M, F, …)
Race (Black, White, Asian, …)
Program major (Stat, CS, Math, Psych, Bio, …)
Type of fertilizer (A, B, …)
Drug (Active, Placebo)
When controlled by the experimenter, called a Factor
Important nomenclature for R
4
Ordinal variables
Take on several levels which have a natural order, but no consistent distance metric
Grade (A+, A, A-, B+, …)
Professor Rating (5, 4, 3, 2, 1)
Likert item
Level of education (PhD, Masters, Bachelors, HS,
Primary, None)
Sports (Rugby, Football, Soccer, … Basketball)
Difficult to deal with, so we usually consider them as either Categorical, or
5
Interval variables
Numerical variable with a consistent distance metric, but no proper zero point
IQ
Temperature (in °C)
SAT score
Slope and difference are meaningful, but ratios are not
6
Ratio variables
Interval variable with a proper