In statistical hypothesis testing, the null hypothesis is a general statement that there is no relationship between two measured phenomena, or no association among groups. The null hypothesis states that there is no statistically significant difference between the variables or groups being studied. Researchers use statistical tests to determine whether to reject or fail to reject the null hypothesis based on sample data. Rejection of the null hypothesis implies that sample data provide sufficient evidence that the alternative hypothesis is true.

## Formulating the Null Hypothesis

The first step in hypothesis testing is stating the null and alternative hypotheses. The null hypothesis is denoted H0, while the alternative is denoted H1 or Ha. The null hypothesis represents the status quo, reflecting no change from normal conditions or no difference existing on the variable of interest between groups. It is the default position that there is no association or causal relationship between variables.

For example, if comparing exam performance between two teaching methods, the null hypothesis would state that the exam scores are the same for the two methods. When testing a new drug, the null hypothesis would be that there is no difference in effectiveness between the new drug and placebo. In court cases, the null hypothesis presumes the defendant is innocent until sufficient evidence proves guilt beyond reasonable doubt.

The null hypothesis helps frame the research question and guides determination of sample size and statistical tests. It is presumed true until statistical evidence nullifies it in favor of the alternative. Failure to reject the null hypothesis implies data do not support conclusion of a significant effect or difference.

## Setting Significance Level

The significance level, denoted by α, is the probability of rejecting the null hypothesis when it is actually true. It is the maximum risk the researcher is willing to take of making a Type I error, which is incorrectly rejecting a true null hypothesis. Conventional levels are 5% (0.05), 1% (0.01), and 0.1% (0.001). A lower significance level requires stronger evidence to reject the null hypothesis.

For example, with α = 0.05, the researcher accepts a 5% chance of incorrectly concluding the null hypothesis is false when it is actually true. This implies 95% confidence the conclusion to reject the null is correct. More stringent levels like 0.01 or 0.001 provide greater confidence but require larger sample sizes to detect effects.

## Collecting Sample Data

Based on the hypotheses, an appropriate sample is selected from the population of interest. Data is collected from the sample and sample statistics calculated, such as the sample mean. The sample should be representative of the population and large enough to provide sufficient statistical power to detect meaningful effects.

In drug testing examples, the sample could be randomly selected patients with the condition being treated by the drug. Data consists of measurements like blood pressure, symptom surveys, or lab results for patients receiving the drug versus placebo. Larger samples provide more precise estimates of true population parameters.

## Choosing the Statistical Test

An appropriate statistical test is chosen based on the hypotheses, type of variables, data distribution, and sample size. Common tests include:

- Z-test: Compares sample mean to a known population mean. Used for large samples from normal distributions.
- T-test: Compares means between two groups. Used for small samples from normal distributions.
- Chi-square test: Compares categorical data between groups. Used for non-normal data.
- Analysis of variance (ANOVA): Compares means across multiple groups. Used for normal data.
- Nonparametric tests: Compare medians or distributions. Used for non-normal data.

For example, a z-test could compare if the sample exam mean differed significantly from the historic population mean of 70. A two-sample t-test could compare exam means between two teaching methods. Chi-square could test if the distribution of exam grades differed between methods.

## Calculating the Test Statistic

Using sample data, the statistical test calculates a test statistic that measures how far the sample data diverges from what would be expected under the null hypothesis. For example:

- Z-test: Calculates z = (x̄ – μ) / (σ / √n) where x̄ is sample mean, μ is population mean, σ is population standard deviation, n is sample size.
- T-test: Calculates t = (x̄1 – x̄2) / s(x̄1 – x̄2) where x̄1 and x̄2 are sample means, s is pooled standard deviation.
- Chi-square: Calculates χ2 test statistic from Observed and Expected frequency counts in each category.

If the null hypothesis is true, the test statistic should follow a known probability distribution, like the normal distribution.

### Example Z-Test

Researchers want to test if the mean weight lost from a new diet differs from the population mean of 5 pounds lost, with known population standard deviation of 2 pounds. From a random sample of 25 participants, the sample mean weight lost was 6.5 pounds. The z-test statistic is:

z = (x̄ – μ) / (σ / √n)

= (6.5 – 5) / (2 / √25) = 2.5

This positive z-score indicates the sample mean exceeds the population mean, suggesting the diet results in greater than average weight loss.

## Determining the p-Value

The p-value represents the probability of obtaining the observed sample results if the null hypothesis is true. A low p-value casts doubt on the null, indicating data do not follow expected patterns if the null hypothesis holds.

Using the known distribution of the test statistic under the null hypothesis, the p-value is calculated as:

- Z-test: Calculate area under normal curve beyond z-test value.
- T-test: Calculate area under t-distribution beyond t-value.
- Chi-square: Calculate area beyond chi-square value.

Smaller p-values provide more evidence against the null hypothesis. The p-value is compared to the significance level α to determine whether to reject or fail to reject the null.

### Example P-Value Calculation

For the z-test example above, the p-value is the area under the normal curve greater than z = 2.5, which is 0.0062. This p-value of 0.0062 is less than the α = 0.05 significance level. There is strong statistical evidence to reject the null hypothesis.

## Reject or Fail to Reject the Null Hypothesis

The null hypothesis is rejected if the p-value is less than the predetermined significance level α. A small p-value indicates the sample data would be highly unlikely if the null hypothesis were true. Rejecting the null implies concluding there is sufficient evidence that the alternative hypothesis is true.

If the p-value is greater than the significance level, the result is not statistically significant and the null hypothesis cannot be rejected. Failure to reject the null indicates data do not provide compelling evidence that the alternative hypothesis is true. More data may be needed to show a significant effect.

Using the diet example above, since the p-value of 0.0062 is below 0.05, the null hypothesis is rejected. The sample provides convincing evidence that the new diet results in a mean weight loss greater than the population average of 5 pounds.

Outcome | Interpretation |
---|---|

Reject null hypothesis | Conclude alternative hypothesis is supported. There is a statistically significant effect or difference. |

Fail to reject null hypothesis | Data do not provide sufficient evidence against null. Conclude no statistically significant effect. |

Failing to reject the null does not necessarily mean the null hypothesis is true, only that evidence is lacking to conclusively disprove it in favor of the alternative. The null hypothesis can never be proven true, only rejected or failed to be rejected.

## Types of Errors

There are two types of errors possible in significance testing:

### Type I Error

A Type I error occurs when the null hypothesis is wrongly rejected when it is actually true. The significance level α is the probability of making a Type I error. In the diet example, a Type I error would be concluding the new diet results in higher weight loss when it truly does not.

### Type II Error

A Type II error occurs when the null hypothesis is not rejected when the alternative hypothesis is true. The probability of a Type II error occurring is denoted by β. In the diet example, failing to reject the null when the new diet does cause greater weight loss would be a Type II error.

Higher sample size and statistical power reduce chances of Type II errors. But pursuing high power can increase Type I error risk. Researchers balance Type I and II errors based on consequences in the context of the hypotheses.

Null True | Null False | |
---|---|---|

Fail to Reject Null | Correct | Type II Error (β) |

Reject Null | Type I Error (α) | Correct |

## Interpreting Results

If the null hypothesis is rejected, the researcher concludes there are statistically significant effects in the population as stated by the alternative hypothesis. Further analysis investigates the magnitude and implications of the effect or difference.

Failing to reject the null indicates no detectable effect based on the sample. Researchers determine if the test had sufficient power to detect meaningful effects. A larger sample may be required. Or the effect size may be smaller than hypothesized.

Practical significance should also be considered. A result may be statistically significant but have little scientific or real-world meaning. Evaluating the size of differences or effect magnitudes, not just rejecting the null, is vital.

### Example Interpretation

Rejecting the null hypothesis suggests the new diet does lead to a mean weight loss greater than 5 pounds, the population average. With a z-score of 2.5 and p-value under 0.05, the sample provides strong evidence of this conclusion. Further analysis estimates the magnitude of the average additional weight loss by the diet.

Practical importance should also be assessed. Though statistically greater than average, an additional 1 to 2 pounds may lack practical relevance. Evaluating substantive impact alongside statistical significance aids meaningful interpretation.

## Conclusion

Hypothesis testing provides a formal process for evaluating statistical evidence. The null hypothesis represents the default position of no effect or difference. Statistical analysis of sample data calculates test statistics and p-values measuring divergence from null expectations. By comparing the p-value to a predetermined significance level, the null hypothesis is rejected or failed to be rejected.

Rejecting the null supports concluding the alternative hypothesis is likely true. Failure to reject means data lack sufficient evidence against the null. Type I and II errors represent risks of false positives and negatives. Proper interpretation considers statistical and practical significance when the null hypothesis is rejected or not rejected.