Regression Analysis.
Essay by Keenan Martinez • March 22, 2019 • Essay • 3,514 Words (15 Pages) • 631 Views
REGRESSION ANALYSIS
Regression analysis involves:
- The selection of the best fit or line when the data that is graphed do not lie on the line – this line will be the one that results in the least squared errors thus the name ordinary least squares;
- Estimation of a dependent variable (Y) based on values of an independent variable (X);
- Testing the relationship between two variables – using the explanatory or independent variable to predict the dependent variable and;
- Forecasting or projecting of some specified variables as are done in feasibilities.
GRAPHICAL INTERPRETATION
The best way to understand the relationship between two variables is to graph the data with the independent variable being on the horizontal axis while the independent variable being on the vertical axis.
For example, a company finds that its sales volume is dependent upon it advertising outlay. Advertising expenditures are budgeted to be $6 million next year. The following table states this relationship.
Sales $ Million | Advertising Expenditures $M |
6 | 3 |
8 | 5 |
9 | 6 |
5 | 4 |
6 | 2 |
8 | 4 |
Table 1
[pic 1][pic 2]
It can be noticed that all of the points plotted do not lie on the line. The fact that all data points are not on the line means that advertising expenditures are not the only predictor or estimator of sales volume for this company. There may be other important variables. There may be errors involved in forecasting sales volume through advertising sales. Therefore there can be a number of possible lines that can be drawn. The idea is to choose the best of these lines; this is what regression analysis does.
LINEAR REGRESSION
In the regression model one of the assumptions is that there exists a relationship between the variables in the model. The simple or general regression model is as follows:
Y = β 0 + β 1 X + Є
Where:
Y = dependent variable to be tested;
X – the independent variable that will be used to make inferences on Y;
β 0 - the intercept that cuts the vertical axis;
β 1 – the coefficient to be estimated or the slope of the line.
Є – the error term or variance that estimates the residual or disparity.
Both the intercept and the coefficient are not known and must be estimated; the computer output gives estimates for these variables. They can also be calculated manually (which we will see later). The objective is to forecast or estimate Y and the difference between actual value and predicted or expected value is the error term. The errors can be either positive or negative and so we need to square the errors to eliminate any negative signs. Because there can be a number of lines that can be drawn through the data points on the graph, we will need to choose the line that gives the least or smallest error and this is why regression analysis is sometimes called ordinary least squares (squaring the errors). There are formulas that can be used to obtain the equation of a straight line that would minimize the sum of the squared errors.
X ' = ∑ X/n
Y ' = ∑ Y/n
β 1 = ∑ (X – X ') (Y - Y ') / ∑ (X – X') 2
β 0 = Y ' - β 1 X '
Answers:
X ' = ∑ X/6 = 24/6 = 4
Y ' = ∑ Y/6 = 42/6 = 7
β 1 = ∑ (X – X ') (Y - Y ') / ∑ (X – X') 2 = 8/10 = 0,8
β 0 = Y ' - β 1 X ' = 7 – (0.8) (4) = 3.8.
Y | X | (X – X') 2 | (X – X ') (Y - Y ') |
6 | 3 | (3 - 4) 2 = 1 | (3 – 4) (6 – 7) = 1 |
8 | 5 | (5 - 4) 2 = 1 | (5 – 4) (8 – 7) = 1 |
9 | 6 | (6 - 4) 2 = 4 | (6 – 4) (9 – 7) = 4 |
5 | 4 | (4 - 4) 2 =0 | (4 – 4) (5 – 7) = 0 |
6 | 2 | (2 - 4) 2 = 4 | (2 – 4) (6 – 7) = 2 |
8 | 4 | (4 - 4) 2 = 0 | (4 – 4)(8– 7) = 0 |
∑ Y = 42 Y ' 42/6 = 7 | ∑ X = 24 X ' 24/6 = 4 | ∑ (X – X ') 2 = 10 | ∑ (X – X ') (Y - Y ') = 8 |
The estimated regression line is therefore:
Ŷ = 3.8 + 0.8 X
or
Sales = 3.8 + 0.8 (advertising expenditure).
Based on the advertising budget for next fiscal year, sales volume for this company is expected to be:
Sales = 3.8 + 0.8 ($6.0) = $8.6 million.
You can therefore forecast sales in terms of advertising expenses.
ASSUMPTIONS OF THE REGRESSION MODEL
Normal Distribution – the data must be distributed which means that it must be continuous and distributed around the mean; if we plot the data as a histogram it will be seen that the actual data deviates (both positive and negative) from the mean; the majority of the data points will be near the centre or mean of the graph;
...
...