Contingency Tables

There are many options for producing contingency tables and summary tables in R.

We will review the following methods:

Producing summary tables using dplyr & tidyr
Producing frequency & proportion tables using table()
producing frequency, proportion, & chi-sq values using CrossTable()

dplyr & tidyr

The more things you can accomplish within the tidyverse of r packages, the better (IMO). Using dplyr to produce your summary stats enables you to continue the code seamlessly into the next task (filtering, plotting, etc…).

The group_by(), summarize(), and spread() commands are a useful combination for producing aggregate or summary values of our data.

First, let’s load dplyr, tidyr, and ggplot2(for the sample data).

library(ggplot2)
library(dplyr)
library(tidyr)
library(knitr) #for printing html-friendly tables.

We will use the mpg dataset from ggplot2 for these exercises.

manufacturer	model	displ	year	cyl	trans	drv	cty	hwy	fl	class
audi	a4	1.8	1999	4	auto(l5)	f	18	29	p	compact
audi	a4	1.8	1999	4	manual(m5)	f	21	29	p	compact
audi	a4	2.0	2008	4	manual(m6)	f	20	31	p	compact
audi	a4	2.0	2008	4	auto(av)	f	21	30	p	compact
audi	a4	2.8	1999	6	auto(l5)	f	16	26	p	compact
audi	a4	2.8	1999	6	manual(m5)	f	18	26	p	compact

Here, we can get the total number of cars with each class & cyl combination using group_by() and summarize().

mpg%>%
  group_by(class, cyl)%>%
  summarize(n=n())%>%
  kable()

class	cyl	n
2seater	8	5
compact	4	32
compact	5	2
compact	6	13
midsize	4	16
midsize	6	23
midsize	8	2
minivan	4	1
minivan	6	10
pickup	4	3
pickup	6	10
pickup	8	20
subcompact	4	21
subcompact	5	2
subcompact	6	7
subcompact	8	5
suv	4	8
suv	6	16
suv	8	38

dplyr & tidyr: Crosstabs

To turn our summary data into a crosstab or contingency table, we need variable A (class) to be listed by row, and variable B (cyl) to be listed by column.

We can achieve this by including the spread() command, to create columns for each cyl value, with n as the crosstab response value.

mpg%>%
  group_by(class, cyl)%>%
  summarise(n=n())%>%
  spread(cyl, n)%>%
  kable()

class	4	5	6	8
2seater	NA	NA	NA	5
compact	32	2	13	NA
midsize	16	NA	23	2
minivan	1	NA	10	NA
pickup	3	NA	10	20
subcompact	21	2	7	5
suv	8	NA	16	38

Summary statistics other than frequency.

One advantage of dplyr is that we can determine what kind of summary statistic we want to see very easily by adjusting our summarize() input.

Here instead of displaying frequencies, we can get the average number of city miles by class & cyl

mpg%>%
  group_by(class, cyl)%>%
  summarise(mean_cty=mean(cty))%>%
  spread(cyl, mean_cty)%>%
  kable()

class	4	5	6	8
2seater	NA	NA	NA	15.40000
compact	21.37500	21	16.92308	NA
midsize	20.50000	NA	17.78261	16.00000
minivan	18.00000	NA	15.60000	NA
pickup	16.00000	NA	14.50000	11.80000
subcompact	22.85714	20	17.00000	14.80000
suv	18.00000	NA	14.50000	12.13158

Or max number of city miles by class & cyl

mpg%>%
  group_by(class, cyl)%>%
  summarise(max_cty=max(cty))%>%
  spread(cyl, max_cty)%>%
  kable()

class	4	5	6	8
2seater	NA	NA	NA	16
compact	33	21	18	NA
midsize	23	NA	19	16
minivan	18	NA	17	NA
pickup	17	NA	16	14
subcompact	35	20	18	15
suv	20	NA	17	14

dplyr & tidyr: Proportions

We can find proportions by creating a new, calculated variable dividing row frequency by table frequency.

mpg%>%
  group_by(class)%>%
  summarize(n=n())%>%
  mutate(prop=n/sum(n))%>%   # our new proportion variable
  kable()

class	n	prop
2seater	5	0.0213675
compact	47	0.2008547
midsize	41	0.1752137
minivan	11	0.0470085
pickup	33	0.1410256
subcompact	35	0.1495726
suv	62	0.2649573

We can create a contingency table of proportion values by applying the same spread command as before. Vary the group_by() and spread() arguents to produce proportions of different variables.

mpg%>%
  group_by(class, cyl)%>%
  summarize(n=n())%>%
  mutate(prop=n/sum(n))%>%
  subset(select=c("class","cyl","prop"))%>%   #drop the frequency value
  spread(class, prop)%>%
  kable()

cyl	2seater	compact	midsize	minivan	pickup	subcompact	suv
4	NA	0.6808511	0.3902439	0.0909091	0.0909091	0.6000000	0.1290323
5	NA	0.0425532	NA	NA	NA	0.0571429	NA
6	NA	0.2765957	0.5609756	0.9090909	0.3030303	0.2000000	0.2580645
8	1	NA	0.0487805	NA	0.6060606	0.1428571	0.6129032

table()

table() is a quick way to pull together row/column frequencies and proportions for categorical variables

Using the basic table() command, we can get a contingency table of vehicle class by number of cylinders.

table(mpg$class, mpg$cyl)

##             
##               4  5  6  8
##   2seater     0  0  0  5
##   compact    32  2 13  0
##   midsize    16  0 23  2
##   minivan     1  0 10  0
##   pickup      3  0 10 20
##   subcompact 21  2  7  5
##   suv         8  0 16 38

Table, Column, and Row Frequencies

The table frequency can also be called by using the ftable() command.

mpg_table<- table(mpg$class, mpg$cyl) #define object w/table parameters for simple calling
ftable(mpg_table)

##              4  5  6  8
##                        
## 2seater      0  0  0  5
## compact     32  2 13  0
## midsize     16  0 23  2
## minivan      1  0 10  0
## pickup       3  0 10 20
## subcompact  21  2  7  5
## suv          8  0 16 38

For row frequencies, we use the margin.table() command, with the 1 argument.

margin.table(mpg_table, 1)

## 
##    2seater    compact    midsize    minivan     pickup subcompact 
##          5         47         41         11         33         35 
##        suv 
##         62

For column frequencies, we use the margin.table() command, with the 2 argument.

margin.table(mpg_table, 2)

## 
##  4  5  6  8 
## 81  4 79 70

Table, Column, and Row Proportions

We can get the proportion values for our variable combinations as well.

For proportion of the entire table, we use the prop.table() command.

prop.table(mpg_table)     #proportion of entire table

##             
##                        4           5           6           8
##   2seater    0.000000000 0.000000000 0.000000000 0.021367521
##   compact    0.136752137 0.008547009 0.055555556 0.000000000
##   midsize    0.068376068 0.000000000 0.098290598 0.008547009
##   minivan    0.004273504 0.000000000 0.042735043 0.000000000
##   pickup     0.012820513 0.000000000 0.042735043 0.085470085
##   subcompact 0.089743590 0.008547009 0.029914530 0.021367521
##   suv        0.034188034 0.000000000 0.068376068 0.162393162

For row proportions, we use the prop.table() command, with the 1 argument following the table name.

prop.table(mpg_table, 1)  #proportion of entire row

##             
##                       4          5          6          8
##   2seater    0.00000000 0.00000000 0.00000000 1.00000000
##   compact    0.68085106 0.04255319 0.27659574 0.00000000
##   midsize    0.39024390 0.00000000 0.56097561 0.04878049
##   minivan    0.09090909 0.00000000 0.90909091 0.00000000
##   pickup     0.09090909 0.00000000 0.30303030 0.60606061
##   subcompact 0.60000000 0.05714286 0.20000000 0.14285714
##   suv        0.12903226 0.00000000 0.25806452 0.61290323

For column proportions, we use the prop.table() command, with the 2 argument following the table name.

prop.table(mpg_table, 2)  #proportion of entire column

##             
##                       4          5          6          8
##   2seater    0.00000000 0.00000000 0.00000000 0.07142857
##   compact    0.39506173 0.50000000 0.16455696 0.00000000
##   midsize    0.19753086 0.00000000 0.29113924 0.02857143
##   minivan    0.01234568 0.00000000 0.12658228 0.00000000
##   pickup     0.03703704 0.00000000 0.12658228 0.28571429
##   subcompact 0.25925926 0.50000000 0.08860759 0.07142857
##   suv        0.09876543 0.00000000 0.20253165 0.54285714

gmodels::CrossTable()

The CrossTable() command from the gmodels package produces frequencies, and table, row, & column proportions with a single command. The values are not as quickly drawn into tables of their own, or further manipulated as they are with the dyplr/tidyr tables, but this is a handy command nonetheless.

Install & Load the gmodels package

install.packages("gmodels")
library(gmodels)

Run the CrossTable() command, with your two variables as inputs.

CrossTable(mpg$class, mpg$cyl)

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  234 
## 
##  
##              | mpg$cyl 
##    mpg$class |         4 |         5 |         6 |         8 | Row Total | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##      2seater |         0 |         0 |         0 |         5 |         5 | 
##              |     1.731 |     0.085 |     1.688 |     8.210 |           | 
##              |     0.000 |     0.000 |     0.000 |     1.000 |     0.021 | 
##              |     0.000 |     0.000 |     0.000 |     0.071 |           | 
##              |     0.000 |     0.000 |     0.000 |     0.021 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##      compact |        32 |         2 |        13 |         0 |        47 | 
##              |    15.210 |     1.782 |     0.518 |    14.060 |           | 
##              |     0.681 |     0.043 |     0.277 |     0.000 |     0.201 | 
##              |     0.395 |     0.500 |     0.165 |     0.000 |           | 
##              |     0.137 |     0.009 |     0.056 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##      midsize |        16 |         0 |        23 |         2 |        41 | 
##              |     0.230 |     0.701 |     6.059 |     8.591 |           | 
##              |     0.390 |     0.000 |     0.561 |     0.049 |     0.175 | 
##              |     0.198 |     0.000 |     0.291 |     0.029 |           | 
##              |     0.068 |     0.000 |     0.098 |     0.009 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##      minivan |         1 |         0 |        10 |         0 |        11 | 
##              |     2.070 |     0.188 |    10.641 |     3.291 |           | 
##              |     0.091 |     0.000 |     0.909 |     0.000 |     0.047 | 
##              |     0.012 |     0.000 |     0.127 |     0.000 |           | 
##              |     0.004 |     0.000 |     0.043 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##       pickup |         3 |         0 |        10 |        20 |        33 | 
##              |     6.211 |     0.564 |     0.117 |    10.391 |           | 
##              |     0.091 |     0.000 |     0.303 |     0.606 |     0.141 | 
##              |     0.037 |     0.000 |     0.127 |     0.286 |           | 
##              |     0.013 |     0.000 |     0.043 |     0.085 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##   subcompact |        21 |         2 |         7 |         5 |        35 | 
##              |     6.515 |     3.284 |     1.963 |     2.858 |           | 
##              |     0.600 |     0.057 |     0.200 |     0.143 |     0.150 | 
##              |     0.259 |     0.500 |     0.089 |     0.071 |           | 
##              |     0.090 |     0.009 |     0.030 |     0.021 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##          suv |         8 |         0 |        16 |        38 |        62 | 
##              |     8.444 |     1.060 |     1.162 |    20.403 |           | 
##              |     0.129 |     0.000 |     0.258 |     0.613 |     0.265 | 
##              |     0.099 |     0.000 |     0.203 |     0.543 |           | 
##              |     0.034 |     0.000 |     0.068 |     0.162 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
## Column Total |        81 |         4 |        79 |        70 |       234 | 
##              |     0.346 |     0.017 |     0.338 |     0.299 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
## 
##