What is the proportion of variance that can be explained by the regression model?

Q: What is the proportion of variance?

What is Proportion of Variance? Proportion of variance” is a generic term to mean a part of variance as a whole. For example, the total variance in any system is 100%, but there might be many different causes for the total variance — each of which have their own proportion associated with them.

The proportion of the variation in the response variable that is explained by the regression model.

Inhaltsverzeichnis Show

What is Explained Variance?
r2 = R2 = η2
The Problems with Multiple Predictors
What proportion of the variation is explained by the regression?
What is the variance of a regression model?
What proportion of the variance in y is explained by the model?
What is the proportion of variance?

If there is a perfect linear relationship between the explanatory variable and the response variable there will be some variation in the values of the response variable because of the variation that exists in the values of the explanatory variable. In any real data there will be more variation in the values of the response variable than the variation that would be explained by a perfect linear relationship. The total variation in the values of the response variable can be regarded as being made up of variation explained by the linear regression model and unexplained variation. The coefficient of determination is the proportion of the explained variation relative to the total variation.

If the points are close to a straight line then the unexplained variation will be a small proportion of the total variation in the values of the response variable. This means that the closer the coefficient of determination is to 1 the stronger the linear relationship.

The coefficient of determination is also used in more advanced forms of regression, and is usually represented by R2. In linear regression, the coefficient of determination, R2, is equal to the square of the correlation coefficient, i.e., R2 = r2.

Example

The actual weights and self-perceived ideal weights of a random sample of 40 female students enrolled in an introductory Statistics course at the University of Auckland are displayed on the scatter plot below. A regression line has been drawn. The equation of the regression line is
predicted y = 0.6089x + 18.661 or predicted ideal weight = 0.6089 × actual weight + 18.661

The coefficient of determination, R2 = 0.822

This means that 82.2% of the variation in the ideal weights is explained by the regression model (i.e., by the equation of the regression line).

Curriculum achievement objectives reference
Statistical investigation: (Level 8)

I will try to explain this in simple terms.

The regression model focuses on the relationship between a dependent variable and a set of independent variables. The dependent variable is the outcome, which you’re trying to predict, using one or more independent variables.

Assume you have a model like this:

Weight_i = 3.0 + 35 * Height_i + ε

Now one of the obvious questions is: how well does this model work? In other words, how well the height of a person accurately predicts – or explains – the weight of that person?

Before we answer this question, we first need to understand how much fluctuation we observe in people’s weights. This is important, because what we are trying to do here is to explain the fluctuation (variation) in weights across different people, by using their heights. If people’s height is able to explain this variation in weight, then we have a good model.

The variance is a good metric to be used for this purpose, as it measures how far a set of numbers are spread out (from their mean value).

This helps us rephrase our original question: How much variance in a person’s weight can be explained by his/her height?

This is where the “% variance explained” comes from. By the way, for regression analysis, it equals the correlation coefficient R-squared.

For the model above, we might be able to make a statement like: Using regression analysis, it was possible to set up a predictive model using the height of a person that explain 60% of the variance in weight”.

Now, how good is 60%? It’s hard to make an objective judgement about this. But if you have other competing models – say, another regression model that uses the age of a person to predict his/her weight – you can compare different models based on how much variance is explained by them and decide which model is better. (There are some caveats to this, see ‘Interpreting and Using Regression’ -- Christopher H. Achen http://www.sagepub.in/books/Book450/authors)

Statistics Definitions >

What is Explained Variance?

Explained variance (also called explained variation) is used to measure the discrepancy between a model and actual data. In other words, it’s the part of the model’s total variance that is explained by factors that are actually present and isn’t due to error variance.

Higher percentages of explained variance indicates a stronger strength of association. It also means that you make better predictions (Rosenthal & Rosenthal, 2011).

r2 = R2 = η2

Explained variance can be denoted with r2. In ANOVA, it’s called eta squared (η2) and in regression analysis, it’s called the Coefficient of Determination (R2). The three terms are basically synonymous, except that R2 assumes that changes in the dependent variable are due to a linear relationship with the independent variable; Eta2 does not have this underlying assumption.

In ANOVA, explained variance is calculated with the “eta-squared (η2)” ratio Sum of Squares(SS)between to SStotal; It’s the proportion of variances for between group differences.

R2 in regression has a similar interpretation: what proportion of variance in Y can be explained by X (Warner, 2013).

The Problems with Multiple Predictors

In general, the more predictor variables you add, the higher the explained variance. The amount of overlapping variance (the variance explained by more than one predictors) also increases. However, there comes a point of diminishing returns when new predictors in the model result in an inability to tell which predictor is producing what result. Furthermore, if you add two highly correlated predictors to a model, you introduce the possibility of multicollinearity .

On the other hand, adding too few predictors can also pose a problem: Omitting a predictor variable that can potentially explain some of the variance results in bias. Therefore, a careful balance must be made between too many predictors and too few.

References

Rosenthal, G. & Rosenthal, J. (2011). Statistics and Data Interpretation for Social Work. Springer Publishing Company.
Warner, R. (2013). Applied Statistics: From Bivariate Through Multivariate Techniques. SAGE.

---------------------------------------------------------------------------

Need help with a homework or test question? With Chegg Study, you can get step-by-step solutions to your questions from an expert in the field. Your first 30 minutes with a Chegg tutor is free!

Comments? Need to post a correction? Please Contact Us.

What proportion of the variation is explained by the regression?

In linear regression, the coefficient of determination, R2, is equal to the square of the correlation coefficient, i.e., R2 = r2. This means that 82.2% of the variation in the ideal weights is explained by the regression model (i.e., by the equation of the regression line).

What is the variance of a regression model?

In terms of linear regression, variance is a measure of how far observed values differ from the average of predicted values, i.e., their difference from the predicted value mean.

What proportion of the variance in y is explained by the model?

The proportion of Y variance explained by the linear relationship between X and Y = r2 = 1, or 100%.