Skip to main content

How to interpret goodness of fit statistics in regression analysis?

This article explains how to interpret the goodness of fit statistics computed by XLSTAT in the case of a Simple linear regression in Excel tutorial model.

In XLSTAT, many statistical analyses return the Goodness of fit statistics table in the output sheet. Usually, the same 13 indicators are presented such as the R2, MSE, AIC, SBC, etc.

Even though their interpretation may be challenging sometimes, we are going to break them down one by one in this article in order to prevent any further doubts. Let’s start with a practical example.

Linear regression analysis on iris flowers

Our dataset contains information regarding 100 flowers. On each flower, four measurements have been taken: sepal length, sepal width, petal length and petal width. The species is also specified: Setosa, Virginica or Versicolor.
dataset for linear regression in XLSTAT
In this analysis, we would like to see if we can predict the petal length based on the sepal length and the sepal width. By using Multiple Linear Regression in Excel tutorial, we can quickly and easily establish a linear model and evaluate its accuracy thanks to the Goodness of fit statistics.
Goodness of fit coeffients returned in XLSTAT
To reproduce this specific example, download our trial version and the dataset on the top right of the page.

What does each statistic mean?

The Goodness of fit statistics of this model are the following:

Observations: The first line specifies the number of observations in the dataset.
In this example, there are 100 flowers on which we have taken the measures.

Sum of weights: The observations may not be weighted equally in order to grant more importance to some of them. This line indicates the total sum of weights of all observations in the dataset. 
Here, all observations had the same weight (1) so the sum of weights is 100.

DF: This abbreviation stands for Degrees of Freedom. They are an estimation of the independent number of values that were used for the calculations. Usually, this number is equal to the number of observations minus the number of parameters to estimate.
Here, we are running a linear regression on 2 explanatory variables so we need to estimate 2 coefficients and the intercept which makes 3 parameters to estimate. Thus, the number of degrees of freedom here is 100-3=97.

R2: The R2 statistic represents the ratio between the sum of explained variation by the model and the actual variation in the response variable.
Here, an R2 of 0.870 means that 87% of the variability contained in the data was explained by this model.

Adjusted R2: Like the R2, the adjusted R2 represents the ratio between the sum of explained variation and the actual one, but this time it takes into account the number of explanatory variables in order for the user not to select non-significant ones. It is therefore useful when you have a large number of variables to estimate the quality of the model.
Here, we only have 2 explanatory variables so the adjusted R2 is not relevant and does not differ much from the R2.

MSE: This abbreviation stands for Mean Squared Error. It is the average of the squared differences between the predicted values and the actual ones for each observation.
Here, it is around 42. It seems big since petal length varies somewhere between 15 and 70mm, hence the following statistic.

RMSE: This abbreviation stands for Root Mean Squared Error. It enables us to get the MSE back to the data scale.
Here, it is around 6.5, which shows a lack of precision in the predictions when compared to the values taken by petal length.

MAPE: This abbreviation stands for Mean Absolute Percentage Error. It is the mean error percentage between actual values and predicted values.
Here, it is around 17.8%. It means that, on average, each predicted value is 17.8% more or less than the actual one.

DW: This abbreviation stands for Durbin Watson and this statistic aims to study autocorrelation between variables. Its values range from 0 to 4. If it is 2, then there is no autocorrelation between variables. Less than 2 means that there is a positive one while over 2 means that there is a negative one.
Here, it has a value close to 2 which suggests close to no autocorrelation between the variables.

Cp: Mallows’ Cp depends on the MSE, the sample size N and the number of predictors P. It enables you to find the best number of predictors for your explained variable. Its value should be the lowest possible and not go over P.
Here, we had P=2 explanatory variables and the Cp is 3, which enables us to know that the model is not of good quality because one of the variables doesn’t provide much information.

AIC: The Akaike Information Criterion is calculated based on the number of estimated parameters and the maximum likelihood. It enables us to compare two imbricated models by choosing the lowest value.
Here, the AIC is 377. We cannot really make any conclusions because we need other possible models to compare it with in order to select the best of them.

SBC (or BIC): The idea behind the Schwarz Bayesian Criterion is similar to the one behind AIC but it also takes sample size into account.
Similarly to the AIC, we cannot really interpret the result here with no other model to compare to.

PC: This abbreviation stands for Amemiya Prediction Criterion. It is another way of adjusting the R2 statistic, more specifically the 1-R2 value that therefore has to be the lowest possible. Similarly to the adjusted R2, it adjusts the R2 depending on the number of predictors.
Here, it is 0.138 which could be lower if our model were better. It tells us that a variable is not absolutely necessary as we have already seen it with the adjusted R2.

Conclusion:

Now, you are ready to interpret the goodness of fit statistics returned by XLSTAT!
To sum up, some of them require other models to compare ours to such as Cp, AIC, or SBC. Other ones such as R2 ou MAPE will enable you to evaluate the explanation of the response variable. Furthermore, the adjusted R2 or the PC will allow you to judge the quality of your model in order to see if some variables could be removed: each criterion has its own interpretation and use.

Was this article useful?

  • Yes
  • No