Regression
Essay by kesavkulkarni • January 8, 2016 • Coursework • 436 Words (2 Pages) • 989 Views
Develop histograms for each of the predictors. Do the distributions of each of the predictor variables have extreme skewness or outliers that could cause estimation problems?
[pic 1]
Looking into above data NI_TA with -0.7832 have negative skewness and CA_CL with 0.74 have positive skewness, the other two predictor’s looks normal distribution.
NI_TA have numerous outlier on the both side of whiskers. CA_CL have few outlier on the positive side.
Row 231 show negative value for CA_CL, which is unclear, it’s better to remove that row before building the model.
b. For each of the four predictors test the null hypothesis that the mean values are the same for the bankrupt versus financially sound firms. What do you conclude?
[pic 2]
CF_TD and NF_TA have significant difference in Mean null hypothesis are false for both the predictor.
CA_CL though the means are positive show some null hypothesis but different is larger than CA_NS.
CA_NS shows the same mean for both the responses.
c. Examine the scatterplot matrix for the four predictor variables. Is there evidence of potential multicollinearity? Discuss.
[pic 3][pic 4]
With response 1 and 0, CF_TD and NI_TA are strongly positively correlated.
For prediction it’s better to remove any one of them.
d. Divide the sample into training (80%) and testing (20%) subsets. Using the training data develop a prediction model. Investigate main effects, possible non-linear effects, and possible interactions among the predictors. When you have decided on a final model:
With NI_TA, CA_CL and CA_NS: CA_CL and NI_TA show <.0001 Prob >ChiSq which are the best predictors for the model.
[pic 5]
- Write the estimated equation as (1) a logit function of the predictors; (2) odds as a function of the predictors; (3) the probability as a function of the predictors.
[pic 6]
Logit:
[pic 7]
Odds:
[pic 8]
Probability:
[pic 9]
- Is the entire logistic regression statistically significant?
Yes. The entire logistics regression is statistically significant.
- What variables are statistically significant?
CA_CL and NI_TA are statistically significant.
- Interpret the coefficients on each predictor in your model. Which one(s) appear to have the strongest impact on probability of bankruptcy?
- How well does your model predict bankruptcy using the training data?
Misclassification rate is 0.20
- How well does your model predict bankruptcy using the test data? Is there evidence of over fitting of the model? Discuss.
Using Best predictors CA_CL and NI_TA:
% of 1’s captured – 81% Training Data (126/155), 75% Testing Data (34/45)
% of 0’s captured -78% Training Date (129/165), 91% Testing Data (32/35)
- Develop and interpret the lift and ROC curves.
[pic 10]
...
...