Hypothesis Testing- Tests of Significance

Hypothesis testing involves using sampled data to determine if there is evidence that a population parameter is different than an assumed value. We already know that for a result to be considered statistically significant, it should be unlikely to happen due to chance alone.

How unlikely is unlikely? That depends on how sure you want to be that chance is not responsible for the result.

Statisticians recommend using a probability of .05 as a good starting place for thinking about unlikely events.

So, we shall say that if an observed outcome has less than a .05 probability of happening randomly, then it can be called statistically significant.

We will apply this thinking in the example below.

Do ACT test preparation classes really help?

What we have done in our example is called an hypothesis test. Formal hypothesis testing requires the formulation of null and alternative hypotheses involving a population parameter. In our case we were wondering if the classes could increase the population mean on the ACT, or if they have no effect. These two hypotheses are called the alternative hypothesis and the null hypothesis. The notation for these is below:


Notice that we have explicitly stated the two situations in terms of the population parameter mu .

We then identify a test statistic, which in our case was the sample mean.

By examining the sampling distribution of the test statistic when there is no treatment effect, we can find those values of the test statistic which indicate the alternative could be true and which are also unlikely to happen if there is no treatment effect.

Values unlikely to happen by chance, which favor the alternative, cause us to reject the null hypothesis.

Values which do not strongly favor the alternative cause us to fail to reject the null hypothesis.

Thus are the two possible outcomes of the test

  • reject the null hypothesis
  • fail to reject the null hypothesis.

In our example we would reject the null hypothesis if the mean exceeds 21.28 and fail to reject if the mean is less than 21.28.

This type of hypothesis testing is called testing with a fixed significance level alpha = .05. The significance level alpha = .05 is the probability we have chosen to indicate unlikely events.

Changing the level of significance of a test

Different choices of significance level can result in different outcomes in hypothesis testing. If we had used a significance level of a = .01 , then we would only reject the null hypothesis if the mean exceeded ( see tables )

               20 + 2.326 * .7826 = 21.82

And so, we would fail to reject the null hypothesis in our example ( since the sample mean was only 21.5 ) if we had used this smaller significance level.


A way around this apparent difficulty is to report the p-value, which is the smallest level of significance at which the null hypothesis would be rejected. To actually compute the p-value we need to find a probability associated with the observed value of the test statistic, if the null hypothesis was true. The probability in question is the

p-value = the chance of observing values favoring the alternative at least as much as the observed value - due to chance alone.

In our example the actual p-value is somewhere between .05 and .01.

This p-value is found from first

     computing z = ( 21.5 - 20 ) / .7826 = 1.9166

then finding the probability from the tables

      1 - .9726 = .0274

Small p-values indicate very strong evidence against the null hypothesis, because you can be quite sure random variation was not responsible for the observed value of the test statistic. Large p-values indicate no evidence against the null hypothesis.

The language of p-values provides a way of interpreting the outcome of any type of hypothesis test, even if you do not know the details of the procedure itself.

The hypothesis testing framework

We will adopt p-values as the standard way of reporting the results of any hypothesis test. Thus hypothesis testing will consist of the following steps:

  •    1) Statement of null and alternative hypothesis
  •    2) Identification of a test statistic
  •    3) Computation of the observed value of the statistic
  •    4) Determinivg the p-value associated with the observed value
  •    5) Stating our conclusions
  • Which alternative? Hypothesis tests for the population mean - the three types

    What kind of change are you expecting to see?

    • increase
    • decrease
    • some change

    examples of p-values in each of three cases

    Rules of thumb:

    • if the p-value is greater than .1 do not reject Ho
    • if the p-value is between .1 and .05 there is some evidence against Ho, but the results are not considered statistically significant
    • if the p-value is between .05 and .01 reject Ho and declare the results significant
    • if the p-value is less than .01 reject Ho and declare the results highly significant.

    Tests with rejection regions based on a p-value of alpha are said to have level of significance alpha.

    Using z as the test statistic instead of the sample mean - the z-test

    Since we need to compute z anyway to get the p-value why not just compute it up front?