Hypothesis Testing Tests of SignificanceHypothesis testing involves using sampled data to determine if there is evidence that a population parameter is different than an assumed value. We already know that for a result to be considered statistically significant, it should be unlikely to happen due to chance alone. How unlikely is unlikely? That depends on how sure you want to be that chance is not responsible for the result. Statisticians recommend using a probability of .05 as a good starting place for thinking about unlikely events. So, we shall say that if an observed outcome has less than a .05 probability of happening randomly, then it can be called statistically significant. We will apply this thinking in the example below. Do ACT test preparation classes really help? What we have done in our example is called an hypothesis test. Formal hypothesis testing requires the formulation of null and alternative hypotheses involving a population parameter. In our case we were wondering if the classes could increase the population mean on the ACT, or if they have no effect. These two hypotheses are called the alternative hypothesis and the null hypothesis. The notation for these is below:
Notice that we have explicitly stated the two situations in terms of the population parameter mu . We then identify a test statistic, which in our case was the sample mean. By examining the sampling distribution of the test statistic when there is no treatment effect, we can find those values of the test statistic which indicate the alternative could be true and which are also unlikely to happen if there is no treatment effect. Values unlikely to happen by chance, which favor the alternative, cause us to reject the null hypothesis. Values which do not strongly favor the alternative cause us to fail to reject the null hypothesis. Thus are the two possible outcomes of the test
In our example we would reject the null hypothesis if the mean exceeds 21.28 and fail to reject if the mean is less than 21.28. This type of hypothesis testing is called testing with a fixed significance level alpha = .05. The significance level alpha = .05 is the probability we have chosen to indicate unlikely events. Changing the level of significance of a testDifferent choices of significance level can result in different outcomes in hypothesis testing. If we had used a significance level of a = .01 , then we would only reject the null hypothesis if the mean exceeded ( see tables ) 20 + 2.326 * .7826 = 21.82 And so, we would fail to reject the null hypothesis in our example ( since the sample mean was only 21.5 ) if we had used this smaller significance level. pvaluesA way around this apparent difficulty is to report the pvalue, which is the smallest level of significance at which the null hypothesis would be rejected. To actually compute the pvalue we need to find a probability associated with the observed value of the test statistic, if the null hypothesis was true. The probability in question is the pvalue = the chance of observing values favoring the alternative at least as much as the observed value  due to chance alone. In our example the actual pvalue is somewhere between .05 and .01. This pvalue is found from first computing z = ( 21.5  20 ) / .7826 = 1.9166 then finding the probability from the tables 1  .9726 = .0274 Small pvalues indicate very strong evidence against the null hypothesis, because you can be quite sure random variation was not responsible for the observed value of the test statistic. Large pvalues indicate no evidence against the null hypothesis. The language of pvalues provides a way of interpreting the outcome of any type of hypothesis test, even if you do not know the details of the procedure itself. The hypothesis testing frameworkWe will adopt pvalues as the standard way of reporting the results of any hypothesis test. Thus hypothesis testing will consist of the following steps: Which alternative? Hypothesis tests for the population mean  the three types What kind of change are you expecting to see?
examples of pvalues in each of three cases Rules of thumb:
Tests with rejection regions based on a pvalue of alpha are said to have level of significance alpha. Using z as the test statistic instead of the sample mean  the ztestSince we need to compute z anyway to get the pvalue why not just compute it up front?
