If a data set has sst = 2,000 and sse = 800, then the coefficient of determination is _____.

Created by Anna Szczepanek, PhD

Reviewed by Bogna Szyk and Jack Bowater

Last updated: Sep 09, 2021

Use our coefficient of determination calculator to find the so-called R-squared of any two variable dataset. If you've ever wondered what the coefficient of determination is, keep reading, as we will give you both the R-squared formula and an explanation of how to interpret the coefficient of determination. We also provide an example of how to find the R-squared of a dataset by hand, and what the relationship is between the coefficient of determination and Pearson correlation.

What is the coefficient of determination?

In linear regression analysis, the coefficient of determination describes what proportion of the dependent variable's variance can be explained by the independent variable(s). In other words, the coefficient of determination assesses how well the real data points are approximated by regression predictions, thus quantifying the strength of the linear relationship between the explained variable and the explanatory variable(s). Because of that, it is sometimes called the goodness of fit of a model.

Most of the time, the coefficient of determination is denoted as R2, simply called "R squared".

How to use this coefficient of determination calculator?

Our R-squared calculator determines the coefficient of determination, R2, for you if you are working with a simple linear regression, Y ~ aX + b:

Input your data points into the appropriate rows. Extra rows will appear as you write;
When at least three points are in place, our coefficient of determination calculator will return the value of R2 at the bottom of the calculator, along with an interpretation; and
Decide whether you want to see just a basic summary, or the full details of the calculation.

How to interpret the coefficient of determination?

The coefficient of determination, or the R-squared value, is a value between 0.0 and 1.0 that expresses what proportion of the variance in Y can be explained by X:
- If R2 = 1, then we have a perfect fit, which means that the values of Y are fully determined (i.e., without any error) by the values of X, and all data points lie precisely at the estimated line of best fit.
- If R2 = 0, then our model is no better at predicting the values of Y than the model which always returns the average value of Y as a prediction.
Multiplying R2 by 100%, you get the percentage of the variance in Y which is explained with help of X. For instance:
- If R2 = 0.8, then 80% of the variance in Y is predicted by X
- If R2 = 0.5 then half of the variance in Y can be explained by X
The complementary percentage, i.e., (1 - R2) * 100%, quantifies the unexplained variance:
- If R2 = 0.6, then 60% of the variance in Y has been explained with help of X, while the remaining 40% remains unaccounted for.

The formula for the coefficient of determination

Let
(x1, y1), ..., (xn, yn)
be our sample data, and let

ȳ be the average of y1, ..., yn; and
ŷ1, ..., ŷn be the fitted (predicted) values of the simple regression model Y ~ aX + b.

Before we give the R-squared formula, we need to define three types of sums of squares:

The sum of squares of errors (SSE in short), also called the residual sum of squares:
SSE= ∑(yi - ŷi)²
SSE quantifies the discrepancy between real values of Y and those predicted by our model. Based on SSE, you can compute the mean squared error (MSE).
The regression sum of squares (shortened to SSR), which is sometimes also called the explained sum of squares:
SSR = ∑(ŷi - ȳ)²
SSR measures the difference between the values predicted by the regression model and those predicted in the most basic way, namely by ignoring X completely and using only the average value of Y as a universal predictor.
The total sum of squares (SST), which quantifies the total variability in Y:

SST = ∑(yi - ȳ)²

It turns out that those three sums of squares satisfy:

SST= SSR + SSE

so you only need to calculate any two of them, and the remaining one can be easily found!

It's time for the formula for the coefficient of determination, R2! Here are a few (equivalent) formulae:

R2 = SSR / SST

R2 = 1 - SSE / SST

R2 = SSR / (SSR + SSE)

How to find the coefficient of determination?

Let us determine the coefficient of determination for the following data:
(0, 1), (2, 4), (4, 4)

Calculate the mean of y's, so, for 1, 4, 4, ȳ = 3
Use our simple linear regression calculator to fit the model Y ~ aX + b to our data: Y ~ 0.75x + 1.5
With the help of the regression line, determine the fitted (predicted) values using ŷi = 0.75xi + 1.5:
ŷ1 = 1.5
ŷ2 = 3
ŷ3 = 4.5
Compute SST: square the differences between yi and ȳ, then sum the results:

(1 - 3)2 = 4
(4 - 3)2 = 1
(4 - 3)2 = 1
SST = 4 + 1 + 1 = 6
Compute SSR: square the differences between ŷi and ȳ, then sum the results:
(1.5 - 3)2 = 2.25
(3 - 3)2 = 0

(4.5 - 3)2 = 2.25
SSR = 2.25 + 0 + 2.25 = 4.5
Apply the R-squared formula:
R2 = SSR / SST = 4.5 / 6 = 0.75

R-squared and correlation

In simple linear least-squares regression, Y ~ aX + b, the coefficient of determination R2 coincides with the square of the Pearson correlation coefficient between x1, ..., xn and y1, ..., yn.

For instance:

Suppose you know the correlation of your data set: r = 0.9;
To find the coefficient of determination, just square the correlation coefficient: r2 = 0.81;
Convert the result to a percentage: 0.81 = 81%; and
You may now conclude that the values of X account for 81% of variability observed in Y.

Data (you may enter up to 30 points)

Enter at least 3 points (x and y coordinates).

AB testConfidence intervalContinuity correction… 26 more

How do you find the coefficient of determination from SST and SSE?

R 2 = S S R S S T = 1 − S S E S S T . R a d j 2 = 1 − ( n − 1 n − p ) S S E S S T . SSE is the sum of squared error, SSR is the sum of squared regression, SST is the sum of squared total, n is the number of observations, and p is the number of regression coefficients.

How do you find the coefficient of determination?

It measures the proportion of the variability in y that is accounted for by the linear relationship between x and y. If the correlation coefficient r is already known then the coefficient of determination can be computed simply by squaring r, as the notation indicates, r2=(r)2.

What is SST SSR and SSE in regression?

Calculation of sum of squares of total (SST), sum of squares due to regression (SSR), sum of squares of errors (SSE), and R-square, which is the proportion of explained variability (SSR) among total variability (SST) No. X. Y.

How do you calculate r

R 2 = 1 − sum squared regression (SSR) total sum of squares (SST) , = 1 − ∑ ( y i − y i ^ ) 2 ∑ ( y i − y ¯ ) 2 . The sum squared regression is the sum of the residuals squared, and the total sum of squares is the sum of the distance the data is away from the mean all squared.