Chi-Square

          Chi-Square is a test of significance used frequently in the social sciences. It is based on the assumption that there is no relationship between two variables in the population -- the null hypothesis (H0). Given the observed distribution of two separate variables (for example, sex and class attendance), we compute the distribution we would expect to occur on the two variables together. We then compare this set of expected frequencies to the frequencies actually observed in our sample. Next we determine the probability that the observed difference (if any) could be a result of chance or sampling error.

          Suppose we select 100 students at random. Forty (40) of these are male and sixty (60) are female. We ask them if they attended class today or did they skip. Let's say 70% of our respondents reported that they did attend class and 30% report that they skipped.

          Not knowing anything else, we would expect that we had an even split between males and females. In other words, if there is no relationship between sex and class attendance, then we expect 70% of the 40 males and 70% of the 60 females in our sample to have attended class. Likewise, we expect that 30% of the males and 30% of the females skipped class.

          These figures are recorded in Table 1 below.
           

          Table 1: Expected Frequencies
          Males Females   Total 
          Attended Class 70% of 40 = 28
          70% of 60 = 42
            70
          Skipped Class
          30% of 40 = 12
          30% of 60 = 18
          30
          Total 40 60 100

          However, what we actually observed in our sample is not the same as what we would expect from probability. The actual responses for our hypothetical survey are shown in Table 2.
           

          Table 2: Observed Frequencies
            Males    Females    Total 
          Attended Class 20
          50
            70
          Skipped Class
          20
          10
          30
          Total 40 60 100

          In looking at the two tables, we can see that fewer men attended class (n = 20) than we expected (n = 28). On the other hand, more women attended class (n = 50) than we expected (n = 42).

          What we do now is find the overall discrepancy (difference) between the distribution we observed and the distribution we expected. This measure is called chi-square.

          There are four steps in the computation of chi-square.

          • For each cell, we will subtract the expected frequency (Table 1) from the observed frequency (Table 2).
          • For each cell, we will square the quantity we obtained in step 1.
          • Again, for each cell, we divide the squared difference (from step 2) by the expected frequency in Table 1.
          • Now we sum the cells.
          This entire process is shown in Table 3.
           
          Table 3: Computation of Chi-Square
            Males    Females    Total 
          Attended Class Step1:       20-28= -8 

          Step 2:        -82 = 64

          Step 3: 64/28 = 2.29

          Step1:        50-42= 8 

          Step 2:         82 = 64

          Step 3: 64/42 = 1.52

            70
          Skipped Class Step1:       20-12=  8 

          Step 2:         82 = 64

          Step 3: 64/12 = 5.33

          Step 1:      10-18= -8 

          Step 2:        -82 = 64

          Step 3: 64/18 = 3.56

          30
          Total N 40 60
          100
          Chi-Square = 
          2.29+1.52+5.33+3.56 = 12.70 Tada!!
          But . . . we're not done yet.

          Now we have to figure out whether the difference between our observed frequencies and our expected frequencies is statistically significant.

          Could these differences have occurred due to chance or sampling error?

          Degrees of Freedom

          This is where the concept of degrees of freedom becomes important. The degrees of freedom is computed from the possibilities available for variation.

          For example, select three numbers that have a mean of 10. The combinations of numbers we can do this with are vast.

          Now, let's require one of those numbers be 10. We now limit the possibilities for the other two numbers, but they are still huge.

          Okay, so let's require that one number be 10 and the second number to be 15. Now, we have no choice for the third number . . . it HAS to be 5.

          So there are 2 degrees of freedom. Two numbers can be anything we choose (they have freedom to vary), after that we have no choice.

          In a bivariate table, we can only choose one (1) number before we are forced into choices for the remaining cells. Try it! Copy Table 4 below. Let's say 20 females answered Yes to our question. Fill in the rest of the table. Check Answer Here
           

          Table 4: Visualizing Degrees of Freedom
          Males
          Females
          Total
          Yes
          20
          50
          No
          50
          Total
          50
          50
          100

          This actually has a formula for crosstabulation tables. The degrees of freedom equals the number of rows (across) minus one, times the number of columns minus one.

             d.f. = (rows - 1)(columns - 1) 

          Suppose we had a 3X2 table.

          What would the degrees of freedom be?        Check Answer Here

          In our example, we have 2 rows and 2 columns. Thus,

          d.f. = (2-1)(2-1) = (1)(1) = 1.

          We have 1 degree of freedom.

          We now look at a table showing the probability distribution of chi-square based on the degrees of freedom. A truncated (shortened) table (Table 5) is shown below for purposes of this example. See a full distribution in your textbook.
           

          Table 5: Probability Values of Chi-Square
          d.f.
          .10
          .05
          .01
          1
          2.706
          3.841
          6.635
          2
          4.605
          5.991
          9.210
          3
          6.251
          7.815
          11.341
          4
          7.779
          9.488
          13.277
          First go to the row for the degrees of freedom you have calculated. In our example this is 1. Read across the table until you find the largest chi-square value less than the chi-square value you have calculated. In our example this is 12.70. In this truncated table, 6.635 is the largest chi-square value less than our calculated value.

          With 1 degree of freedom, and random sampling from a population in which there is no relationship between two variables, 10% (.10) of the time we should expect a chi-square of at least 2.706. Five percent (.05) of the time we should expect a chi-square of at least 3.841. One percent (.01) of the time we should expect a chi-square value of at least 6.635.

          The higher the chi-square value, the less likely that the observed difference is due to chance or sampling error.

          In our example we have a chi-square value of 12.70, with 1 degree of freedom. If there were no relationship between sex and class attendance, then we would expect a chi-square of this magnitude (size) less than 1% (.01) of the time. We report this finding by saying "the observed relationship between sex and class attendance is statistically significant at the .01 level."

          We reject the null hypothesis that there is no relationship in the population, because the probability that the observed relationship resulted from chance or sampling error is so low.


          That's it! You have now mastered the computations of chi-square and degrees of freedom.

          Return to Methods and Measurements
          Further questions can be directed to: Linda C. Lambert.

          Answer 1: This means that there must be 30 Males who answered Yes, because we have a total of 50 who said Yes overall. Now, working down the column, if 20 females of the 50 females in the survey answered Yes, then 30 must have answered No. Therefore, we have only one degree of freedom. Once we specify one cell, the remaining cells have no freedom to vary.

          Answer 2: d.f. = (3-1)(2-1) = (2)(1) = 2

          Reference: Babbie, Earl. 1995. The Practice of Social Research, Seventh Edition. Belmont: Wadsworth Publishing Company.

          Background and some icons courtesy of: