Regression Analysis of Television Channels
Essay by yijetidat • January 5, 2017 • Coursework • 3,535 Words (15 Pages) • 1,297 Views
Table of Contents
Introduction
Processing the dataset
▪ The Dataset
▪ Descriptive Statistics
▪ Null Hypothesis Testing
▪ Correlation between the Variables
▪ Regression Model of the Dataset
▪ Residuals Testing
▪ Predicting with regression model
Conclusion - Key findings
Table of Figures:
Figure 1: Sale Price vs. Market Retail Sales
Figure 2: Sale Price Vs. Market Buying Income
Figure 3: Sale Price Vs. TV Homes per Station
Figure 4: Sale Price vs. Network Hourly Rate
Figure 5: Sale Price Vs. National Spot Rate
Figure 6: Sale Price Vs. Age of Station
Figure 7: Sale Price Vs. Number of ties with major Networks
Figure 8: Sale Price Vs. Percent of Market Population in Urban Areas
Tables:
Table 1: Variables of the dataset
Table 2: Descriptive statistics of the dataset
Table 3: STATA result for the two-sample t-test statistic for the mean of selling price depending on the age of the station
Table 4: Correlation coefficient table between each variable of the dataset
Table 5: Regression Results in STATA for regression of Sale Price with National Spot Rate and Market Buying Income
Table 6: Regression Result in STATA of sale price with national spot rate
Table 7: Regression results in STATA for Sale Price against National Spot Rate Market Buying Income, with no constant
Table 8: Results in STATA for Breusch Pagan test
Table 93: Results in STATA for White’s Test for homoscedasticity
Table 10: Results in STATA for Breusch-Godfrey LM test for autocorrelation
Table 11: Results in STATA of VIF (variance inflation factor) multicollinearity test
Table 12: Results in STATA for Ramsey RESET Test
Table 13: Results in STATA for Skewness/Kurtosis Tests for Normality
Introduction
In the present project, we examine a data set with variables referring to the operation of 31 regional TV stations in the USA. Initially, we produce the descriptive statistics for these variables. Afterwards, a linear regression model is built based on the principle of independent and dependent variable, where the dependent variable is the selling price of each station.
The selection of the independent variables is based at first on the backward selection method, which means that the regression model starts by including all independent variables in its equation. The significance of each variable is evaluated based on its p-value as well as the effect on the value of R-squared, in order to reduce the number of independent variables included in the model. As a result, the final regression model has the least amount of significant variables as possible.
However, in order to verify the model, the forward selection method is also performed. In this case, the regression modelling process begins by using each variable independently and then step-by-step slowly building the model, depending on the significance of the coefficient of each variable on the selling price of the TV stations. By following this selection process, we can be sure that the regression model includes every significant variable. The result however is the same with the backward selection process.
Furthermore, the residual diagnostic checks are performed, in order to determine if all conditions required for the assumptions of the linear regression model are satisfied.
Once the regression model is checked and verified, we then go on to predict the selling price of a hypothesized TV station, with its own values of independent variables.
All of the above steps, (including regression and residual tests) are performed using STATA 14.1. Every command used during the process is shown, with its accompanying result, with the purpose to show the methodology we followed.
Processing the dataset
The Dataset
The dataset selected for the present project[1] was obtained from the internet and refers to regional television stations, including their selling prices, in the United States of America.
As mentioned before, it contains details for 31 TV stations, accompanied with 10 columns. The first column is the index name of each TV station, the second is the selling price which will be used as the dependent variable of the regression model, and the 8 that follow are the model’s independent variables.
The table below shows in greater detail the data contained in the dataset, including the variable names, type and measurement units.
Description | Name | Type | Measurement Unit |
Station Call letters | stationcall | String | - |
Sale Price | saleprice | Integer | $ (in thousands) |
Market Retail Sales | marketretailsales | Integer | $ (in millions) |
Market Buying Income | marketbuyingincome | Integer | $ (in millions) |
TV Homes/station | tvhomesstation | Integer | $ (in thousands) |
Network Hourly Rate | networkhourlyrate | Integer | $ per Hour |
National Spot Rate | nationalspotrate | Integer | $ per Hour |
Age of Station | ageofstation | Byte | 0 = Before 1952, 1 = After 1952 |
Number of ties with major networks | numberoftieswithmajornetworks | Byte | 0, 1, 2 |
Percent of Market Population in Urban areas | percentofmarketpopulationinurban | Float | Percentage |
...
...