Malawi Household Survey
Essay by fenglinm4a1 • June 30, 2016 • Research Paper • 3,381 Words (14 Pages) • 1,122 Views
Report #2
ECON-3740 Introduction to Econometrics Fall, 2015
Instructor: D Prescott
Name: Mike
ID: 0832311
Data set number: 4
Date: 2015/11/25
Introduction:
There are many factors which can influence the wage, including such as education level, sex, and reside. In Malawi Household Survey, a sample has been drawn from the population. This sample contains a number of factors and it is used to estimate the multiple regression function of wage. In this report, several regression models will be established to help to analyze the effect of those explanatory variables on dependent variable wage. Then, three hypotheses based on these regression models are raised to for further testing the relationship between the predictors and wage by t-statistic and f-statistic methods. Moreover, a semi-log regression model will be last introduced as well to continue on discovering the relationship.
Task1
>summary(mod1)
Call:
lm(formula = mthpay ~ EL + EL2 + EL3 + F + G + U + PC + Age +
Age2)
Residuals:
Min 1Q Median 3Q Max
-257200 -12370 -1841 8553 611586
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -16931.081 14714.532 -1.151 0.25008
EL 8787.232 2022.440 4.345 1.5e-05 ***
EL2 -2707.763 300.072 -9.024 < 2e-16 ***
EL3 166.480 11.758 14.159< 2e-16 ***
F 137.596 3195.210 0.043 0.96566
G -1321.646 3692.504 -0.358 0.72045
U 8475.399 2929.399 2.893 0.00387 **
PC 9202.831 3209.997 2.867 0.00421 **
Age 1106.462 742.669 1.490 0.13650
Age2 -7.770 9.118 -0.852 0.39430
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 49410 on 1364 degrees of freedom
Multiple R-squared: 0.4531, Adjusted R-squared: 0.4495
F-statistic: 125.6 on 9 and 1364 DF, p-value: < 2.2e-16[pic 1]
[pic 2]
There are 9 estimated variables and a constant in the results of this model: EL, EL2=EL^2 and EL3=EL^3 for education level, F for sex, G for employment by Government, PC for employment by private company, U for reside, Age and Age2=Age^2 for age, and intercept as a constant.
The function of this regression model can be written as: =a+b1EL+b2EL2+b3EL3+b4F+b5G+b6U+b7PC+b8Age+b9Age2. The coefficients “b1~ b9” of these 9 variables, and constant a, which is intercept, are listed under Estimate in this report. The estimated coefficient of EL is 8787.232, which means that, all other variables remain fixed at their means, the wage will numerically increase 8787.232 when education level, EL, becomes 1 higher. Similar interpretations apply for the rest variables. In order to analyze the relationship between education level and wage, all the variables except those related to education level, are set to their mean, and together with the fixed constant of intercept combine into a constant. The change of wage therefore is fully caused by those variables related to education level, which are EL, EL2, and EL3. [pic 3]
Figure 1 is the product plot of this analysis. The range of education level is between 0 and 18. According to the graph, from level 0 to around 2, and 10 to 18, the relationship between wage and education level are positive and negative when education level lies between 2 and 10. The impact of changes in wage is relatively small under the education level ranged from 0 to 10, but significantly increases after education level 10. By simply looking at the result of model report, the only cause of variable that could result in the negative relation between wage and education level is EL2 since its coefficient estimate is -2707.763, which means wage decreases 2707.763 when education level gets 1 higher.
Similar steps are conducted to analyze the relationship between age and wage. All variables other than those of age are set to be at their means. The range of age is from 0 to 70. Based on the Figure 2, it is clear that this relation is positive, in other words, wage increases when people get older. However, this trend of increasing gets slightly weaker when age reaches approximate 55, as shown in the figure that the curve becomes flatter after age 55.
The R-squared of a linear regression model presents the ratio of Regression Sum of Squares (RSS) to Sum of Squares Residuals (SSR). It also equals the ratio of variance of the least square estimate of dependent variable to the sample variance of the variable, which is, for example, Var()/Var(Y). R-squared is from 0 to 1, and it interprets how close the regression line fit the sample data. The higher R-squaredcoefficient corresponds to a steeper regression line and less Sum of Squares Residual. In multiple regression model, R-squared, which in this case is 0.4531, shows the correlation between dependent variable and a linear combination of predictors or variable, and can be interpreted as that 45.31% of the variance in the dependent variable can be explained by the predictors.[pic 4]
Standard Error (SE) is the standard deviation of the sample distribution and it usually happens when sampling from the same population. The SE coefficient of EL, for example, is 2022.440, which means that while sampling based on the single predictor EL, the deviation of sampled means of wage is 2022.440. The t-statistic, which is used in testing hypotheses, is the ratio of the difference of estimated parameter value from its population mean to the standard error. The greater absolute numerical value of t-statistic, the greater evidence against the null hypotheses that there are no significant difference, or to reject them. In this case, EL3 has a high t-statistic of 14.159, which can be interpreted as there is significant difference between the estimated EL3 of this sample and its population mean.[pic 5]
...
...