Saturday, October 06, 2007

Contrasts in R

One of the most neglected topics in Biostatistics is the calculation of contrasts in ANOVA. Most researchers are content with simply calculating an ANOVA table and stating that differences among groups are statistically significant. Some even present box-plots or any other graphic that show where the differences lie. However, few researchers actually test the differences they are looking for.

An ANOVA is designed to test a statistical hypothesis. It looks at the means of each treatment or level, and determines if all the means are equal or not. Nevertheless, the biological hypotheses introduced by the researchers need to be specifically addressed with contrasts. One can generalize and state that while ANOVA analyzes the statistical hypothesis, contrasts look into the biological hypothesis.

The following entry will show how to create a contrast in R. I will not go into detail regarding the theory behind contrasts. I suggest you read a good book on Experimental Design.

Let’s begin by using a data set included in R. We will use the InsectSprays data frame. This experiment shows the number of insects killed (variable count) by using six different insecticides (A:F). The data is loaded by using:

> data(InsectSprays)

Once the data has been invoked, we can start working on it. The first thing we want to do is to determine if the model is balanced. We can use the summary() command to assess some basic information on the data.frame:

> summary(InsectSprays)

count spray

Min. : 0.00 A:12

1st Qu.: 3.00 B:12

Median : 7.00 C:12

Mean : 9.50 D:12

3rd Qu.:14.25 E:12

Max. :26.00 F:12

The previous output shows that we have a balanced design, with 12 replicates per group. Since this is a dataframe the alphanumeric variable, spray, is immediately recognized by R as a factor.

Let’s imagine there are three main questions. First, we want to determine if there are any differences between the spray treatments. Secondly, the researcher wants to know if the first three sprays differ from the latter three. Third, there is strong indication that sprays: A, B, and F; should have a stronger effect than the remaining treatments. Since contrasts are pre-planned or a priori comparisons, we should start by creating the contrasts matrix. We need to create a vector of coefficients for each comparison, and then bind them in a matrix:

> c1 <- c(1,1,1,-1,-1,-1)

> c2 <- c(1,1,-1,-1,-1,1)

> mat <- cbind(c1,c2)

Secondly, we need to specify that the contrast matrix mat should be used to compute SS during anova calculations. We achieve this by assigning the contrasts matrix to the factor sprays using the contrasts() command:

> contrasts(InsectSprays$spray) <- mat

It should be noted, that once this assignment is implemented, the aov() command will calculate SS for the contrasts established in each column of mat. If you wish to change the contrasts, then a new contrasts matrix should be created and assigned to the factor.

We now perform the analysis of variance, and request a summary table. The anova table is split to include the contrasts:

> model1 <- aov(count ~ spray, data = InsectSprays)

> summary(model1, split=list(spray=list("First 3 vs other"=1, "ABF vs CDE"=2)))

Df Sum Sq Mean Sq F value Pr(>F)





spray: First 3 vs other 1





spray: ABF vs CDE 1




< 2e-16

Residuals 66




Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Let’s analyze the previous code. First we conduct an analysis of variance with the aov() command. The results are stored in the model1 object. Immediately following, we request a summary table with the summary() command. The summary command includes the option: split. The split option provides a list of factors where contrasts are stored. Within each factor, we also provide a list which includes the names of the contrasts (i.e. each column) stored in the contrasts matrix columns (i.e. mat).

The ANOVA table shows a significant effect of the spray varieties on insect moratility. The contrasts suggest significant differences between the first three and the latter three spray types. Finally, we see that most of the treatment SS are explained by the comparison of the ABF and CDE groups. A box-plot clearly shows that the main differences lie between the groups compared by the second contrasts. The comparisons among groups are better observed using a boxplot:

> boxplot(count ~ spray, data = InsectSprays, + xlab = "Type of spray", ylab = "Insect count", + main = "InsectSprays data", col = "skyblue")


  1. Really Nice step-by-step explanation!
    Very useful =)
    Eternally grateful! LOL =)

  2. This comment has been removed by the author.