Chapter 9 Correlation and partitioning of variation
The coefficient of determination, R2, compares the variation in the response variable to the variation in the fitted model value. It can be calculated as a ratio of variances:
Swim <- SwimRecords # from mosaicData
mod <- lm( time ~ year + sex, data = Swim)
var(fitted(mod)) / var(Swim$time)
## [1] 0.8439936
The convenience function rsquared()
does the calculation for you:
rsquared(mod)
## [1] 0.8439936
The regression report is a standard way of summarizing models. Such a report is produced by most statistical software packages and used in many fields. The first part of the table contains the coefficients — labeled “Estimate” — along with other information that will be introduced starting in Chapter @ref(“chap:confidence”). The R2 statistic is a standard part of the report; look at the second line from the bottom.
summary(mod)
##
## Call:
## lm(formula = time ~ year + sex, data = Swim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7027 -2.7027 -0.5968 1.2796 19.0759
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 555.71678 33.79991 16.441 < 2e-16 ***
## year -0.25146 0.01732 -14.516 < 2e-16 ***
## sexM -9.79796 1.01287 -9.673 8.79e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.983 on 59 degrees of freedom
## Multiple R-squared: 0.844, Adjusted R-squared: 0.8387
## F-statistic: 159.6 on 2 and 59 DF, p-value: < 2.2e-16
Occasionally, you may be interested in the correlation coefficient r between two quantities.
You can, of course, compute r by fitting a model, finding R2, and taking a square root.
mod2 <- lm( time ~ year, data = Swim)
coef(mod2)
## (Intercept) year
## 567.2420024 -0.2598771
sqrt(rsquared(mod2))
## [1] 0.7723752
The cor()
function computes this directly:
cor(Swim$time, Swim$year)
## [1] -0.7723752
Note that the negative sign on r indicates that record swim time
decreases as year
increases. This information about the direction of change is contained in the sign of the coefficient from the model. The magnitude of the coefficient tells how fast the time
is changing (with units of seconds per year). The correlation coefficient (like R2) is without units.
Keep in mind that the correlation coefficient r summarizes only the simple linear model A ~ B where B is quantitative. But the coefficient of determination, R2, summarizes any model; it is much more useful. If you want to see the direction of change, look at the sign of the correlation coefficient.