Hypothesis testing
Understand the steps for hypothesis testing.
Parametric and non-parametric tests
T-tests
Null hypothesis: there is no difference between the values of the means in the populations from which the samples were drawn (the two samples belong to the same population)
\(H_0 : \mu = \mu_0\)
Alternative hypothesis: there is a difference between the values of the means in the populations from which the samples were drawn (the two samples belong to two populations)
\(H_A : \mu \neq \mu_0\)
Such a hypothesis test is called two-sided test.
If the hypothesis test is about deciding, whether a population mean, μ, is higher than the specified value μ0, the alternative hypothesis is expressed as
\(H_A : \mu > \mu_0\)
If the hypothesis test is about deciding, whether a population mean, μ, is lower than a specified value μ0, the alternative hypothesis is expressed as
\(H_A : \mu < \mu_0\)
Important: when we perform hypothesis testing we want to reject the null hypothesis.
Calculate a test statistic to find the probability of the results observed, on the assumption that the null hypothesis is true. The test statistic is a number calculated from the data set, which is obtained by measurements and observations, or more general by sampling.
\(H_0 : \mu = \mu_0\)
\(H_A : \mu < \mu_0\)
We have to assume that there will always be a chance that the differences that we are observing are due to chance (sampling differences) and not due to a true difference brought by the independent variable.
We set a probability for our observed results to occur under the null hypothesis.
Significance level of 5% means that there is a 5% of probability of our observed difference to be a result of chance (different sampling). Researchers in the social sciences are normally comfortable with a 5% probability of having found their observed results by chance.
\(p < 0.05\)
A test statistic is a value describing the extent to which the research results differ from the null hypothesis. The test statistic is a hypothesis test that helps you determine whether to support or reject a null hypothesis in your study. You achieve this by using a test statistic to calculate the p-value of your results.
Two-tailed test
\(H_0 : \mu_0 \neq \mu\)
We set the confidence level to 95%
\(\alpha\) (p-value) is the significant level 0.05 (5%) and cuts off the two tails of the distribution, because the test statistic could have either positive or negative values. The critical region cuts an area of \(\alpha\)/2
19.6 and -1.96 are our critical values. If our test statistic is lower or higher than the critical values we REJECT the null hypothesis with 95% confidence. The difference between the two means is not likely to be by change at a significance level of 5%.
One-tailed test
The number of syllables produced by children enrolled with atypical language development will be lower when compared to the number of syllables produced by children with typical language development.
\(H_0 : \mu < \mu_0\)
We set the confidence level to 95% (learning in a classroom setting will result in higher results than learning in a classroom setting 95% of the time)
\(\alpha\) (p-value) is the significant level 0.05 (5%) and cuts off the two tails of the distribution, because the test statistic could have either positive or negative values. The critical region cuts an area of \(\alpha\)/2
The null hypothesis is rejected if the test statistic is too small.
A left-tailed test is used when the alternative hypothesis states that the true value of the parameter specified in the null hypothesis is less than the null hypothesis claims.
A right-tailed test is used when the alternative hypothesis states that the true value of the parameter specified in the null hypothesis is greater than the null hypothesis claims
When we reject or not reject the null hypothesis we do so using a significance level (5%), and this means that we still have some chance to reject the null hypothesis when the reality is that the null hypothesis is true, or to accept the null hypothesis when the reality is that the null hypothesis is false.
Type I error: The null hypothesis is rejected when it is actually true (false positive)
Type II error: The null rejected is not rejected when it is actually false (false negative)
Parametric tests:A parametric test is a statistical test which makes certain assumptions about the distribution of the unknown parameter of interest and thus the test statistic is valid under these assumptions. Parametric tests have the benefit of being precise in their assumptions which leads to more precise inferences.
Data are normally distributed.
Some parametric tests, populations have equal variances.
Example: Difference between the mean scores of a group of students that learned statistics for 10 hours a week during 2 weeks, and a group of students that learned statistics for 2 hours a week during 20 weeks.
Non parametric tests: Nnon parametric tests are methods of statistical analysis that do not require a distribution to meet the required assumptions to be analyzed (especially if the data is not normally distributed). Due to this reason, they are sometimes referred to as distribution-free tests.
Factors to decide which test to use:
Non-parametric tests: for ranking, ordinal variables, and numeric variables that are not normally distributed.
Non-parametric tests are less powerful than parametric tests.
Example: Participants decide whether speech produced using a mask is intelligible or not (Likert scale from 1 to 7) and this is compared to speech produced without using a mask. Difference between the means of the intelligibiltiy judgement task.
When you conduct a hypothesis test using two random samples, you must choose the type of test based on whether the samples are dependent or independent.
Correlated samples: for repeated measures designs
Example: one group of speakers exposed to speech produced with masks and speech not produced with masks.
Independent samples: two unrelated populations
Example: one group of speakers exposed to speech produced with masks and one group of speakers exposed to speech not produced with masks).
In this case, the ratio \(\frac{\bar{X}_1 - \bar{X}_2}{\text{standard error of difference between means}}\) is not normally distributed but follows the t distribution.
The t-distribution is similar in shape to the normal distribution, but has heavier tails, meaning that it gives more probability to extreme values than the normal distribution.
The t-distribution is defined by a single parameter, the degrees of freedom (df). The degrees of freedom are a measure of the sample size, and determine the shape of the t-distribution. As the degrees of freedom increase, the t-distribution approaches the normal distribution. When the sample size is large (typically, greater than 30), the t-distribution is very similar to the normal distribution.
The degrees of freedom in a t-test are calculated with the sample size and sample statistics. We will not worry about calculating degrees of freedom because R does it for us when we run a t-test.
The t-test makes two assumptions:
The distributions of the populations from which samples are drawn are approximately normal.
The distributions of the populations from which samples are drawn have equal variances.
We have two sample means. These differ to a greater or lesser extent.
We have some idea of what sort of difference we believe exists between the means of the two populations from which we think these samples have come. Under the null hypothesis (that our experimental manipulation has had no effect on our subjects), we would expect the two population means to be identical (i.e., to show no difference).
We compare the difference we actually have obtained, to the difference (no difference) that we would expect to obtain. If we have found a very big difference between our two sample means, there are two possibilities.
Calculate t: we measure how many standard errors the observed difference is away from the expected difference under the null hypothesis. If the difference is large relative to the standard error, it suggests that the means of the two groups are significantly different.
t = \(\frac{\bar{x}_{1} - \bar{x}_{2}}{\sqrt{s^{2}_{p}(\frac{1}{n_{1}}+\frac{1}{n_{2}})}}\)
We have two groups of bilingual speakers: low proficiency and medium proficiency. We want to calculate whether their proficiency scores are, indeed, different.
In non-directional t-test a large t-score, or t-value, indicates that the groups are different while a small t-score indicates that the groups are similar.
##
## Welch Two Sample t-test
##
## data: med_prof and low_prof
## t = 13.316, df = 37.082, p-value = 1.058e-15
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.227856 4.386382
## sample estimates:
## mean of x mean of y
## 17.94874 14.14162
The t value is 13.32 and degrees of freedom 37. The critical value for the t distribution at 38 degrees of freedom two-directional and 0.05% of significance level is 2.021. A larger t value means that the group means are different. We can reject the null hypothesis because 13.32 is larger than 2.021.
You can find the t-table here.
For a one-direction t-test, we can hypothesize that the students with medium-proficiency will have greater mean scores than the students with low-proficiency.
In a right-tail directional t-test a large t-score indicates that the sample mean 1 is greater than the sample mean 2.
##
## Welch Two Sample t-test
##
## data: med_prof and low_prof
## t = 13.316, df = 37.082, p-value = 5.288e-16
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 3.324792 Inf
## sample estimates:
## mean of x mean of y
## 17.94874 14.14162
In this case, the critical value of t is 1.68.
In right-tail directional t-test a small t-score indicates that the sample mean 1 is greater than the sample mean 2.
For a one-direction t-test, we can hypothesize that the students with low-proficiency will have lower mean scores than the students with medium-proficiency.
##
## Welch Two Sample t-test
##
## data: low_prof and med_prof
## t = -13.316, df = 37.082, p-value = 5.288e-16
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -3.324792
## sample estimates:
## mean of x mean of y
## 14.14162 17.94874
The null hypothesis is that the distribution IS normally distributed. The test computes the W statistic measuring whether the distribution of observed data points across quantiles is similar to that of normal distribution.
The W statistic comes with the probability (p-value). This is the probability that the W statistic takesthis value by chance/fluke/accident. If the p-value is below 0.05, we reject the null hypothesis at the 5% significance threshold. That is, we say that the distribution is not normal and the probability that we make an (alpha) error is below 5%. If the p-value is above 0.05, we fail to reject the null hypothesis,i.e. the distribution is normal.
shapiro.test(pizza_time) #p > 0.05 which means we accept the null hypothesis that the data is normally distributed
##
## Shapiro-Wilk normality test
##
## data: pizza_time
## W = 0.99388, p-value = 0.9349