Hypothesis Testing- Tests of Significance
Hypothesis testing involves using sampled data to determine if there is evidence that a population parameter is different than an assumed value. We already know that for a result to be considered statistically significant, it should be unlikely to happen due to chance alone.
How unlikely is unlikely? That depends on how sure you want to be that chance is not responsible for the result.
Statisticians recommend using a probability of .05 as a good starting place for thinking about unlikely events.
So, we shall say that if an observed outcome has less than a .05 probability of happening randomly, then it can be called statistically significant.
We will apply this thinking in the example below.
What we have done in our example is called an hypothesis test. Formal hypothesis testing requires the formulation of null and alternative hypotheses involving a population parameter. In our case we were wondering if the classes could increase the population mean on the ACT, or if they have no effect. These two hypotheses are called the alternative hypothesis and the null hypothesis. The notation for these is below:
Notice that we have explicitly stated the two situations in terms of the population parameter mu .
We then identify a test statistic, which in our case was the sample mean.
By examining the sampling distribution of the test statistic when there is no treatment effect, we can find those values of the test statistic which indicate the alternative could be true and which are also unlikely to happen if there is no treatment effect.
Values unlikely to happen by chance, which favor the alternative, cause us to reject the null hypothesis.
Values which do not strongly favor the alternative cause us to fail to reject the null hypothesis.
Thus are the two possible outcomes of the test
In our example we would reject the null hypothesis if the mean exceeds 21.28 and fail to reject if the mean is less than 21.28.
This type of hypothesis testing is called testing with a fixed significance level alpha = .05. The significance level alpha = .05 is the probability we have chosen to indicate unlikely events.
Changing the level of significance of a test
Different choices of significance level can result in different outcomes in hypothesis testing. If we had used a significance level of a = .01 , then we would only reject the null hypothesis if the mean exceeded ( see tables )
20 + 2.326 * .7826 = 21.82
And so, we would fail to reject the null hypothesis in our example ( since the sample mean was only 21.5 ) if we had used this smaller significance level.
A way around this apparent difficulty is to report the p-value, which is the smallest level of significance at which the null hypothesis would be rejected. To actually compute the p-value we need to find a probability associated with the observed value of the test statistic, if the null hypothesis was true. The probability in question is the
p-value = the chance of observing values favoring the alternative at least as much as the observed value - due to chance alone.
In our example the actual p-value is somewhere between .05 and .01.
This p-value is found from first
computing z = ( 21.5 - 20 ) / .7826 = 1.9166
then finding the probability from the tables
1 - .9726 = .0274
Small p-values indicate very strong evidence against the null hypothesis, because you can be quite sure random variation was not responsible for the observed value of the test statistic. Large p-values indicate no evidence against the null hypothesis.
The language of p-values provides a way of interpreting the outcome of any type of hypothesis test, even if you do not know the details of the procedure itself.
The hypothesis testing framework
We will adopt p-values as the standard way of reporting the results of any hypothesis test. Thus hypothesis testing will consist of the following steps:
Which alternative? Hypothesis tests for the population mean - the three types
What kind of change are you expecting to see?
Rules of thumb:
Tests with rejection regions based on a p-value of alpha are said to have level of significance alpha.
Using z as the test statistic instead of the sample mean - the z-test
Since we need to compute z anyway to get the p-value why not just compute it up front?