Using the dataset provided, we can create Table 1.1 that illustrates the mean, standard deviation, difference between the means, and the p-value of the difference between the means. We can infer from Table 1.1 that randomization of assignment between the treatment (assigned to receive a call) and control group (never receive a call) is successful because the p-value of the difference in means is high enough and the difference between the treated and untreated mean for variables are close to zero percent difference. At first glance, it may seem that “age” have a large numerical difference but if you compare the difference with the mean, it is only a small percentage, thus, statistically insignificant.
Now that we know that the data is properly randomized, we can then run multiple regressions adding one …show more content…
This may be due to the non-randomized assignment of the treatment and control group (selection bias) and the systemic differences between the people who picked up the phone and listened and those who didn’t. From this, we can also infer that there may be omitted variable bias, because the covariates have an influence on both the variable “contact” and “vote02.”
It is also important to note that even after adding all the covariates, the coefficient for treatment are nowhere close to the original one we obtained using RCT. However, we are forgetting the fact that not everyone in the treatment group picked up their phones and listened to the whole message. To account for non-compliers in the treatment and control group, we can divide the effect of treatment effect on outcome (Reduced Form) by the First Stage to get LATE, a better comparison for non-experimental