Jack Cai
PURPOSE
Create a simulator from scratch that:
•Generates data from a variety of distributions
•Makes a response variable from a known function of the data (plus an error term)
•Constructs a linear model that estimates the coefficients of the function
•Repeats generation and modeling many times to compare the average estimates of the linear model to the known parameters.
•Package the whole thing nicely into a function that we can call in a single line in later work.
•If you’re experienced, the commands themselves may seem trivial
Outline
•
1) Learning how to learn
•
2) Randomly Generating Data
•
3) Data Frames and Manipulation
•
4) Linear Models
•
BREAK – Quality of presenter improves
•
5) Running loops
•
6) Function Definition
•
7) More advanced function topics
•
8) Using functions
•
9) A short simulation study
Learning how to learn – Jack Cai
•
Google CRAN Packages to get the package list
•From here you can get a description of every command in a package.
•
?? searches for commands related to
•??plot will find commands related to plot
•
? calls up the help file for that command
•?abline gives the help file for the abline() command.
LEARNING HOW TO LEARN – JACK CAI
•
Exercises:
•
Name one function in the darts game package.
•
What is the e-mail of the author of the Texas Holdem simulation package?
•
(Bonus) Tell the author about your day via e-mail; s/he likes hearing from fans.
•
Find a function to make a histogram
•
Find some example code on the heatmap() command.
Randomly generating data – jack cai
•
The r commands randomly generate data from a distribution
•
rnorm( n , mean, sd)
•
rexp( n, rate)
•
rbinom( n, size, prob)
•
rt( n, df) From Student’s T. (Mean is zero, so setting a mean is up to you)
•
set.seed() Allows you to generate the same data every time, so you or others can verify work.
Generates from normal distribution (default N(0,1))
RANDOMLY GENERATING DATA – JACK CAI
•
Set a random seed
•
Generate a vector of 50 values from the Normal (mean=10,sd=4) distribution, name the vector x1.
•
Do the same with
• Poisson ( lambda = 5), named x2,
• Exponential (rate = 1/7) named x3,
• Student’s t distribution (df =5), with a mean of 5, named x4,
• Normal (mean=0, sd=20), named err
•
Make a new variable y, let it be 3 + 20x1 + 15x2 – 12x3 – 10x4 + err
Data frames – jack cai
•
data.frame() makes a dataframe object of the vectors listed in the ()
•
The advantage of having a data frame is that it can be treated as a single object..
•
Data frames, models, and even matrix decompositions can be objects in R.
•
You can call parts of objects by name using $
•
model$coef or model$coefficient will bring up the estimated coefficients
•
If no such aspect exists, then you’ll get a null response.
•
Example: Cai$height
Data frames – jack cai
•
Exercises:
•
Make a data.frame() of x1,x2,x3,x4, and y Name it dat
•
(if you’re stuck from the last part, run “Q3-dataframethis.txt” first)
•
Use index indicators like dat[4,3], dat [2:7,3], dat [4,], and dat [4,-1] to get
• The 3rd row, 5th entry of dat
• The 2nd – 7th values of the 5th column
• The entire 3rd row
• The 3rd row without the 1st entry
Linear models – jack cai
•
The results of the lm() function are an object.
•
Example: mod = lm(y ~ x1 + I(x2^2) + x1:x2, data=dat)
•
Useful aspects
• mod$fitted
• mod$residuals
•
Useful functions
• summary(mod)
• predict(mod, newdata)
Linear models – jack cai
•
Use the lm command to create a linear model of y as a function of x1,x2,x3, and x4 additively using dat data, name it mod. (No interactions or transformations)
•
Get the summary of mod
•
Display the estimated coefficients with no other