Essay about Sampling Project

STAT506 Sampling Project

Group Members: Wenli Hu, Joyce Jiang, Xi Tian, Ye Yu April 19th, 2012



Is it possible to collect data from the entire population? -If so, we can talk about what is true for the entire population -Often we cannot (time/cost) -If not, we can use a smaller subset: a SAMPLE

 Research Background Introduction

Sampling Methods 1. simple random sampling 2. post- stratification 3. regression 4. stratified sampling  Conclusion






 



Pima Indians are the American Indians who live today in the Gila River Indian Community. (Arizona) Genetically, Pima Indians have a high rate of diabetes (type II) much higher than “normal” rate in the US. They are said to be genetically susceptible to diabetes and obesity. These Pima Indians are taken as an example of how genetics can cause diabetes. Pima women seem to have higher rate than men.

  

Done by the National Institute of Diabetes and Digestive and Kidney Diseases Data received: 9 May 1990 Population: 768 women Pima Indians
 Tested positive instances: 268  Tested negative instances: 500



Our observations attributes
 Plasma glucose concentration a 2hours in an oral

glucose tolerance test  Age  Class variable (0 or 1)

 

Simplest Establish a sample size and proceed to randomly select units until we reach the sample size

• Data set:
We have a list of 532 patients and randomly select 50 of them from this list (without replacement). N=532 n=50

...

•

Data analysis



Advantages
-Simple and unbiased



Disadvantages
-Requires an accurate list of the whole population -Expensive to conduct

 

stratification after selection of the sample Not balanced with respect to diabetes type

...

Diabetes
Yes No

Sample Size
177 355

Glucose Mean
142.69 114.08

Variance
824.40 632.69

= 26.43



Advantages -make weighted estimates to ensure proportional representation. Disadvantages -Requires more information about the population being sampled.



Regression estimator: age as auxiliary variable

z$glu

80
20

100

120

140

160

180

200

30

40 z$age

50

60

Coefficients:Estimate Std. Error t value Pr(>|t|) (Intercept) 87.3574 13.2850 6.576 3.29e-08 *** z$age 1.0855 0.3939 2.756 0.00826 **

 

Y: glucose X: age  x=31.61466

X =31.6

l = a + b* x

= 87.3574 + 1.0855* 31.6466 = 121.67

Var ( l ) = (N-n)*MSE / (N*n) = (532-50) * 1076.4 / (532*50) = 19.50

 

performs regression analysis for sample survey data

handle survey sample designs including designs with stratification, clustering, and unequal weighting With ESTIMATE statements, you can specify a regression estimator



proc surveyreg data=Municipalities total=50; cluster Cluster; model Population85=Population75; estimate '1985 population' Intercept 284 Population75 8200; run;

Cited from: http://www.math.montana.edu/~jobo/thai/4ratreg.pdf

Stratified Sampling

nh 

n  Nh N

n1  17 n2  33

Show More

Mnmprojpart1Online(1) Essay

Sampling and Simple Random Sampling Essay

Hybrid Vehicles Essay

Sexual Assault On College Campuses

Math1280 Unit 3 Assignment

Non Probability Sampling Method

Safg Case Study

Research Paper On Market Research

Essay on Market Research Projects

Unit 2 Diabetes Research Questions