Complicated Rating System Among Different Appellation
Essay by jerryinmn • May 20, 2018 • Essay • 823 Words (4 Pages) • 835 Views
Introduction
AOC, AOP, DOC, etc., you must familiar with these letters on the tags of red wines if you are not sober, for which are seen as the guarantee of quality and taste. However, there are so many countries producing red wines with the different rating system and languages. Moreover, some famous appellations implement their own rating system. All these elements contribute to a complex rating system which is difficult to understand for common people.
Besides the problem relating to complicated rating system among different appellation, the red wines produced in one area usually rated various grade among different sommeliers because of the diverse taste of the later ones. For example, a wine was rated 90+ in mouthfeel by one sommelier who prefers bold wine while 60+ by another who prefers subtle wine. Rate depending on sommelier also may lead to the problem that equally rated wines taste very different. For instance, a wine gets 90+ by a subtle wine lover while another wine gets 90+ by a bold wine lover, which got the same grade in mouthfeel but taste totally different.
Therefore, due to the two shortages of current rating system for red wines we found above, we would like to, by data analysis, figure out the common points among rated quality red wines and utilize them as criteria to rating red wines objectively and accurately. The dataset we use here includes 1600 observations from Portugal area and data also includes 12 categories: Fixed Acidity, Volatile acidity, Citric acid, Residual sugar, Chlorides, free sulfur dioxide, total sulfur dioxide, density, PH, sulphate, alcohol, and quality.
Analysis and Result
The main task for us is to discover and test the effects of alcohol, PH, residual sugar, density and other chemical factors on the red wine quality. We need to find whether all of these variables have a significant effect on the quality of the red wine. If not, which of these variables we need to keep and which of them we need to remove? Here we use the multiple regression analysis because it can help us to see how the multiple factors affect the outcome and set the quality as the response variable and others as the predicting variables.
Firstly, we start our analysis in regression with the full model, which means we use the regression with all variables we used. The summary table (Table 1) shows us these explanatory variables together can explain statistically significant variation in red win quality with the low p-value and denies the null hypothesis. And the R-squared, 0.35 shows us the model can explain 35 percent of the variation in the red wine quality. However, most of the variables with the high p-value (which is larger than 0.05) show us that they are not significantly important to describe the result. Therefore, then we firstly use the test-based backward elimination method to eliminate the variables with the high p-value to find optimal model. This method helps us get the reduced model (Table 2). The summary table of the reduced model shows us that all seven remaining variables are significant important with the low p-value. The estimated regression line is shown as below:
Estimated Quality = 4.43 + 0.289alcohol -1.01 Volatile acid + 0.88 sulphate -0.003 total sulfur dioxide – 2.01 Chlorides – 0.48PH + 0.05 Free sulfur dioxide
From the equation, we can find that the alcohol has kind of strong positive effect on the red wine quality based on its positive coefficient and the PH has the strong negative effect on the quality based on its negative coefficient.
Then we need to test our regression model. Here, the data we use to build our model is just the half of the total data set, which means that we divide data into two parts: the training part data and the test data. For training data, we use it to build our regression model and then we use the testing data to validate the model. Therefore, when we compare the predicted results for new data and the actual result from the testing data, we can find that among the 800 observations, 485 is concordant and 315 is discordant.
...
...