Chapter 10 Hypothesis Tests

A statistical hypothesis is an assertion or conjecture about a parameter (or parameters) of a population.

For example, the assertion that the mean body temperature of a healthy adult is 98.4 degrees Fahrenheit.

Procedure

To verify such an assertion statistically, one has to

  1. Make a study in which a simple random sample is selected and the value in question is collected for every subject in the sample (in this case the body temperature of the healthy adults).
  2. Set the null hypothesis H0 and the alternative hypothesis HA.
  3. Find the sample statistics (in this case, the sample mean body temperature, standard deviation, and sample size).
  4. Fix a level of significance (for example 90%, 95% or 98%)
  5. Determine the corresponding critical value.
  6. Calculate the test statistic (see table).
  7. Decide whether the claim is accepted or rejected.

See the attached file for an example on calculating the hypothesis tests for difference of means.

Table for Hypothesis Tests

Hypothesis for parameter conditions test statistic critical values
Proportion p np>=5, nq>=5 $z=\frac{\hat{p}-p}{\sqrt{pq/n}}$ z-score table
Mean (n>=30) $\mu$ $\sigma$ known, $n\ge 30$ $z=\frac{\bar{x}-\mu}{\sigma/\sqrt{n}}$ z-score table
Mean (n<30) $\mu$ $\sigma$ known, $n< 30$ $t=\frac{\bar{x}-\mu}{s/\sqrt{n}}$ t-table, df=n-1
Standard deviation $\sigma$ normally dist. pop. $\chi^2=\frac{(n-1)s^2}{\sigma^2}$ chi-square table, df=n-1 |

Comparing two means

Depending on the two sample sizes and scenario we mention 3 different tests to compare two means:

Hypothesis for parameter conditions test statistic critical values
Difference of Means (n>=30) $\mu_1-\mu_2$ $\sigma_1,\sigma_2$ known, $n\ge 30$ $z=\frac{\bar{x}_1-\bar{x}_2-(\mu_1-\mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1}+ \frac{\sigma_2^2}{n_2}}}$ z-score table
Difference of Means (n<30) $\mu_1-\mu_2$ $n< 30$ $t=\frac{\bar{x}_1-\bar{x}_2-(\mu_1-\mu_2)}{s_p\sqrt{\frac{1}{n_1}+ \frac{1}{n_2}}}$, where $s_p=\sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}}$ t-table, $df=n_1+n_2-2$
Difference of Means (paired data) $\mu=0$ $n< 30$ $t=\frac{\bar{x}-\mu_0}{s/\sqrt{n}}$ t-table, $df=n-1$

An example following the book for a difference of means is given here.

F-distribution

Some statistical tests use the F distribution, which requires two parameters. A table is available here. For example when comparing two population variances, an F-distribution is required. In that case, the test statistic is

(1)
\begin{align} F = \frac{s_1^2}{s_2^2} \end{align}

and the the F-critical value accepts two parameters which are

(2)
\begin{equation} df_1 = n_1-1 \end{equation}

and

(3)
\begin{equation} df_2 = n_2-1. \end{equation}

Below is the image of an F-distribution, obtained with R with the following code

df1 <- 3
df2 <- 15
x <- seq(0, 8, len = 401)
u <- qf(0.05, df1 = df1, df2 = df2, lower.tail = FALSE)
# This is the graph of the density:
plot(x, df(x, df1 = df1, df2 = df2), type = 'l', ann = FALSE)
# Now the code for filling the right-hand tail area
xx <- seq(u, 8, len = 401)
polygon(x = c(xx[1], xx), y = c(0, df(xx, df1, df2)), col = 'red')
# The figure looks better, if I fill the empty part of the x-axis:
lines(c(0, u), c(0, 0))

which was found here.
Fdist.png

Analysis of Variance (ANOVA) for comparing multiple means

In order to compare the means of more than two samples coming from different treatment groups that are normally distributed with a common variance, an analysis of variance is often used.

The following table summarizes the calculations that need to be done, which are explained below:

Source df SS (sum of squares) MS (mean squares) F
Treatments k-1 SST (sum of squares of treatments) MST = SST / (k-1) MST/MSE
Error n-k SSE (sum of squares of error) MSE = SSE/(n-k)
Total n-1 Total SS

Letting $x_{ij}$ be the jth measurement in the ith sample (where $j=1, 2, \dots, n$,

then

(4)
\begin{align} Total\ SS = \sum\limits_{i,j} (x_{ij} - \bar{x})^2 = \sum x_{ij}^2 - \frac{(\sum x_{ij})^2}{n} = \sum x_{ij}^2 -CM \end{align}

and the sum of the squares of the treatments is

(5)
\begin{align} SST = \sum n_i(\bar{x}_i-\bar{x})^2 = \sum\frac{T_i^2}{n_i} - CM \end{align}

where $T_i$ is the total of the observations in treatment i, $n_i$ is the number of observations in sample i and CM is the correction of the mean

(6)
\begin{align} T_i= \sum\limits_j x_{ij} \end{align}
(7)
\begin{align} CM= \frac{(\sum x_{ij})^2}{n}. \end{align}

The sum of squares of the error SSE is given by

(8)
\begin{align} SSE= Total\ SS - SST \end{align}

and

(9)
\begin{align} F = \frac{MST}{MSE}. \end{align}

Example: Breakfast and children's attention span

An example for the effect of breakfast on attention span (in minutes) for small children is summarized in the table below:

No Breakfast Light Breakfast Full Breakfast
8 14 10
7 16 12
9 12 16
13 17 15
10 11 12

is given in Excel here.

The hypothesis test would be

(10)
\begin{align} H_0: \mu_1 = \mu_2 = \mu_3 \end{align}

versus

(11)
\begin{align} H_a: \mu_1 \neq \mu_2\ or\ \mu2\neq \mu_3\ or\ \mu_1\neq \mu_3 \end{align}

An alternative version of the calculation is given here in R:

Table = read.table("TableAnova2.txt", header=TRUE); Table
r = c(t(as.matrix(Table)))     # response data
f = c("No Breakfast", "Light Breakfast", "Full Breakfast")    # treatment levels 
k = 3                # number of treatment levels 
n = 5                # observations per treatment
mt = gl(k, 1, n*k, factor(f))    # matching treatment
av = aov(r ~ mt)
summary(av)

which reads the table TableAnova2.txt the same directory and gives the following result:

            Df Sum Sq Mean Sq F value  Pr(>F)  
tm           2 58.533 29.2667  4.9326 0.02733 *
Residuals   12 71.200  5.9333                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

which indicates that the test statistic F is equal to 4.9326, as we obtained in Excel. The corresponding right-tail probability is 0.027, which means that if the significance level is 0.05, the test statistic would be in the rejection region, and therefore, the null-hypothesis would be rejected.
Hence, this indicates that the means are not equal, or in other words, that sample values give sufficient evidence that not all means are the same. In terms of the example this means that breakfast (and its size) does have an effect on children's attention span.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License