Suppose we select 100 students at random. Forty (40) of these are male and sixty (60) are female. We ask them if they attended class today or did they skip. Let's say 70% of our respondents reported that they did attend class and 30% report that they skipped.
Not knowing anything else, we would expect that we had an even split between males and females. In other words, if there is no relationship between sex and class attendance, then we expect 70% of the 40 males and 70% of the 60 females in our sample to have attended class. Likewise, we expect that 30% of the males and 30% of the females skipped class.
These figures are recorded in Table 1 below.
| Males | Females | Total | |
| Attended Class | 70% of 40 = 28 |
70%
of 60 = 42
|
70 |
| Skipped Class |
30%
of 40 = 12
|
30%
of 60 = 18
|
30 |
| Total | 40 | 60 | 100 |
However, what we actually observed in our
sample is not the same as what we would expect from probability. The actual
responses for our hypothetical survey are shown in Table 2.
| Males | Females | Total | |
| Attended Class | 20 |
50
|
70 |
| Skipped Class |
20
|
10
|
30 |
| Total | 40 | 60 | 100 |

In looking at the two tables, we can see that fewer men attended class (n = 20) than we expected (n = 28). On the other hand, more women attended class (n = 50) than we expected (n = 42).
What we do now is find the overall discrepancy (difference) between the distribution we observed and the distribution we expected. This measure is called chi-square.
There are four steps in the computation of chi-square.
| Males | Females | Total | |
| Attended Class | Step1:
20-28= -8
Step 2: -82 = 64 Step 3: 64/28 = 2.29 |
Step1:
50-42= 8
Step 2: 82 = 64 Step 3: 64/42 = 1.52 |
|
| Skipped Class | Step1:
20-12= 8
Step 2: 82 = 64 Step 3: 64/12 = 5.33 |
Step
1: 10-18= -8
Step 2: -82 = 64 Step 3: 64/18 = 3.56 |
|
| Total N | 40 | 60 |
|
|
|
2.29+1.52+5.33+3.56 | = 12.70 | Tada!! |
Now we have to figure out whether the difference between our observed frequencies and our expected frequencies is statistically significant.
Could these differences have occurred due to chance or sampling error?
For example, select three numbers that have a mean of 10. The combinations of numbers we can do this with are vast.
Now, let's require one of those numbers be 10. We now limit the possibilities for the other two numbers, but they are still huge.
Okay, so let's require that one number be 10 and the second number to be 15. Now, we have no choice for the third number . . . it HAS to be 5.
So there are 2 degrees of freedom. Two numbers can be anything we choose (they have freedom to vary), after that we have no choice.
In a bivariate table, we can only choose
one (1) number before we are forced into choices for the remaining cells.
Try it! Copy Table 4 below. Let's say 20 females answered Yes to our question.
Fill in the rest of the table. Check Answer Here
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
This actually has a formula for crosstabulation tables. The degrees of freedom equals the number of rows (across) minus one, times the number of columns minus one.
Suppose we had a 3X2 table.
What would the degrees of freedom be? Check Answer Here

In our example, we have 2 rows and 2 columns. Thus,
We have 1 degree of freedom.
We now look at a table showing the probability
distribution of chi-square based on the degrees of freedom. A truncated
(shortened) table (Table 5) is shown below for purposes of this example.
See a full distribution in your textbook.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
With 1 degree of freedom, and random sampling from a population in which there is no relationship between two variables, 10% (.10) of the time we should expect a chi-square of at least 2.706. Five percent (.05) of the time we should expect a chi-square of at least 3.841. One percent (.01) of the time we should expect a chi-square value of at least 6.635.
The higher the chi-square value, the less likely that the observed difference is due to chance or sampling error.
In our example we have a chi-square value of 12.70, with 1 degree of freedom. If there were no relationship between sex and class attendance, then we would expect a chi-square of this magnitude (size) less than 1% (.01) of the time. We report this finding by saying "the observed relationship between sex and class attendance is statistically significant at the .01 level."
We reject the null hypothesis that there is no relationship in the population, because the probability that the observed relationship resulted from chance or sampling error is so low.
Answer 1: This means that there must be 30 Males who answered Yes, because we have a total of 50 who said Yes overall. Now, working down the column, if 20 females of the 50 females in the survey answered Yes, then 30 must have answered No. Therefore, we have only one degree of freedom. Once we specify one cell, the remaining cells have no freedom to vary.
Answer 2: d.f. = (3-1)(2-1) = (2)(1) = 2
Reference: Babbie, Earl. 1995. The Practice of Social Research, Seventh Edition. Belmont: Wadsworth Publishing Company.