STAT REGRESSION ANALYSIS Essay

Submitted By pdshah01
Words: 1546
Pages: 7

1 (a). Since the amount garnished is accumulating at a rate that will pay off the loan at around 12 months, in six months it should have been paid half the total amount. Hence the slope should be 45 degrees (value=1/2) and intercept should be 0.

1 (b). Plot of the variables are shown below along with the LS regression line:

Bivariate Fit of Amt.Repaid.at.6.Months By Total.Amt.to.be.Repaid

Amt.Repaid.at.6.Months = 2926.2502 + 0.4906691*Total.Amt.to.be.Repaid

Hence,
Slope = 0.49 ; Intercept = 2926.25.
The slope is fairly close to 1/2. An intercept of 0 can almost never be expected in actual data (unless simulated to perfection), hence the value should be expected below or above 0. Intercept value of 2926 is, however, well above 0 and could be explained by a big residual at point 36 (case number 450819117) which has a relatively high value for the amount repaid at six months. This point could explain the big shift in intercept.
1 (c). The residual plot is shown below:
Bivariate Fit of Residuals Amt.Repaid.at.6.Months By Total.Amt.to.be.Repaid

Linear Fit
Residuals Amt.Repaid.at.6.Months = -3.52e-13 + 5.816e-18*Total.Amt.to.be.Repaid

The normal Quantile Plot for the residual is shown below:
Distributions
Residuals Amt.Repaid.at.6.Months

For the data to conform to the assumptions of SRM, it needs to agree to the following points:
i) Residuals are independent ii) Residuals have mean zero and have constant variance. iii) Residuals are normally distributed.
From the Normal Quantile plot, it appears that the residual violates the third assumption as a normal distribution is not seen. Because of the scale of the plot, it is hard to observe the independence and constant variance assumptions; scaling has been distorted because of the big residual point (#36).
It should be noted that point #36 (case number 450819117) appears to have a data error, specifically the amount repaid at six months ($3,533,140) appears to be significantly higher than total amount to be repaid ($67,952). It is possible that a decimal point could have been missed and the actual amount could be $35,331.40 which is approximately half of the total amt to be paid.
Assuming that this point has wrong data, we will exclude this point from our plot to observe a better scaled version of the residual. The fact that this is not highly leveraged and hence not influential (observed from the fact that the fit line appear similar before and after excluding the point), we will proceed with the exclusion plot as shown below:

Bivariate Fit of Residuals Amt.Repaid.at.6.Months By Total.Amt.to.be.Repaid

Linear Fit
Residuals Amt.Repaid.at.6.Months = -4342.507 - 0.0266809*Total.Amt.to.be.Repaid

We observe some level of Heteroscedascity thus invalidating assumption ii.
Finally, the Normal Quantile plot of the residual (excluding the big residual point): Distributions Residuals Amt.Repaid.at.6.Months

Clearly, the residual plot doesn’t appear to be Normally distributed and also seem to follow a sinusoidal pattern thus invalidating independence assumption.

Hence, the data doesn’t conform to SRM. It appears, the residuals get bigger as the total amount to be repaid gets bigger. This phenomenon should however be expected because of the selection of the plot. As the total amount to be paid increases, the amount paid in six months is expected to increase thus displaying a “fan” effect (Heteroscedascity).

2 (a). If all the loans are performing comparably on a constant pay-out scale and accumulating at a rate that will pay off the loans at around 12 months, then Amount repaid at six months should be about half of the total amounts due in 12 months. Thus PRSM should be 1 (2*1/2); this plotted against total amount should give a straight line with slope around 0 and intercept about 1.

2 (b). The plot for PRSM vs Total Amt to be repaid is shown below with the LS regression line:
Bivariate Fit of PRSM By