Learning objectives

Calculate standard error
Determining confidence intervals
Understand the difference between dependent and independent variables.
Understand the steps for hypothesis testing.

The central limit theorem and standard error

A short note on populations and samples

A population is the entire group of items, people, or events of interest. Due to practical limitations, it’s often impossible to study the entire population. A sample, which is a subset of the population, is used to make inferences about the population.
A sample needs to be of a sufficient size and randomly selected to accurately represent the population (per the central limit theorem).

The central limit theorem states that, under certain conditions, the sum of a large number of random variables i approximately normally distributed and the spread of the distribution will decrease.

From prwatech.in In this example, we know what the original distribution is, but when we do research (and we sample from the population), we don’t know the true mean of the distribution. In this example, we know that \(\mu\) equals 3.5.

However, we can estimate with what amount of error our observed mean will differ from the population mean (\(\mu\)). To do this, we take the average of our sampling distribution (sum of the distribution divided by the sample size). This will result in a new distribution with a \(\mu\) and a standard deviation. The standard deviation of the distribution of the mean is calculated as the \(\frac{\sigma}{ \sqrt n}\), where \(\sigma\) is standard deviation of rolling one die. Consider that the larger the sample size, the smaller the standard deviation is going to be.

If this was not clear, you can watch: https://www.youtube.com/watch?v=zeJD6dqJ5lo&ab_channel=3Blue1Brown

Implications of the central limit theorem:

The mean of the distribution of sample means is identical to the mean of the “parent population,” the population from which the samples are drawn.
The variance of the sampling distribution is equal to the population variance divided by the sample size.
The average distribution of the summed distributions is approximately normal. This is the basis for statistical inference for means.

The standard error

Recall that the variance of the average distribution of the summed of the distributions is equal to the population variance divided by the sample size.

Standard error: The standard error of the mean, or simply standard error, indicates how different the population mean is likely to be from a sample mean.

Formula:

\(\text{standard error} = \frac{\sigma}{ \sqrt n}\)

\(\sigma\) the population standard deviation. If the population standard deviation is not known, you can substitute the sample standard deviation, s, in the numerator to approximate the standard error.

\(\sqrt n\) the square root of the sample size

Imagine that we are sampling grades (1 to 10) from a class with 100 students.

The first sample is 10. Firstly, calculate the square root of the sample size (n). In this case, n is 10.

sqrt(10)

## [1] 3.162278

Next, divide the standard deviation of the sample means by the square root of 10.

2/3.162278

## [1] 0.6324555

Therefore, the standard error in our population for our grades is 0.633.

Now, notice that the larger the sample size, the more accurately we are predicting out population parameter.

sqrt(20)

## [1] 4.472136

2/4.47

## [1] 0.4474273

Confidence intervals

Remember that the standard error is calculated with the standard deviation of our sampling distribution and the number of samples (the more samples the better our distribution will resemble to the \(\mu\) of the population)

A confidence interval, in statistics, refers to the probability that a population parameter will fall between a set of values for a certain proportion of times. In other words, a confidence interval is the mean of your estimate plus and minus the variation in that estimate.

For example, if you construct a confidence interval with a 95% confidence level, you are confident that 95 out of 100 times the estimate will fall between the upper and lower values specified by the confidence interval.

From mathblog.com

To calculate a confidence interval you need to know:

The point estimate you are constructing the confidence interval for
The standard error of our sample
The confidence level
The critical value

Imagine that you are a speech language pathologist and you are assessing a child that is 2 years old and produces words with a mean of 3.2 syllables per word. You want to know what is the percentage of children that does better that the child you are assessing.

This is a vector with the means of 30 children:

means_syllables_child = c(3.2,3,4.5,3,2,2,2,2.8,3.2,2.5,2,3,4.5,3,2,2,2,2.8,3.2,2.5,2,3,4.5,3,2,2,2,2.8,3.2,2.5)

What is the mean of the samples means?

mean(means_syllables_child)

## [1] 2.74

And the standard deviation?

sd(means_syllables_child)

## [1] 0.7600363

Point estimate: The point estimate of your confidence interval is the statistic you are computing. In this case, the point estimate will be the mean.

From our previous example. If the mean of 2.74 number of syllables, our point estimate is 2.74.

Standard error: The standard error helps us identify how much variation there is in our sampling distribution.

SE = 0.76 / sqrt(30)
SE

## [1] 0.1387564

SE = sd(means_syllables_child)/sqrt(length(means_syllables_child))
SE

## [1] 0.138763

From our previous example, the standard error is 0.14.

Our mean estimate plus and minus the variance is 2.74 +- 0.14

Confidence level: The probability that the confidence interval includes the true mean value within a population is called the confidence level of the confidence interval. You can decide your confidence level.

In short, setting you condifence level to 95% indicates that you are creating a range of values that you can be 95% confident contains the true mean of the population.

Critical value: Cut-off values from the normal distribution that define regions where the mean is unlikely to lie.

If we have a normal distribution, we can use z-scores. If our confidence level is 95%, we need to cut off 5% of the distribution (2.5% in each side).

from sfu.ca

We take the z-score of 1.96 that will encompass 95% of all sample means. 95% probability of getting a z-score of 1.96.

    CI_low <- 2.74 - (1.96*0.14)
    CI_high <- 2.74 + (1.96*0.14)
    print(paste('95% CI :', CI_low, '-', CI_high, 'syllables'))

## [1] "95% CI : 2.4656 - 3.0144 syllables"

Note: Be careful because confidence levels are not the same as confidence intervals

The confidence level is the percentage of times you expect to get close to the same estimate if you run your experiment again or resample the population in the same way.
The confidence interval consists of the upper and lower bounds of the estimate you expect to find at a given level of confidence.

Hypothesis testing

Statistical tests of significance

Test whether the two observed differences could have been expected to occur ‘by chance’ or whether the independent variable truly affects the outcomes of the dependent variable.

The null hypothesis and the alternative hypothesis

Null hypothesis: there is no difference between the values of the means in the populations from which the samples were drawn (the two samples belong to the same population)

There is no difference between the number of syllables in the populations in which the two groups of scores were drawn.

\(H_0 : \mu = \mu_0\)

Alternative hypothesis: there is a difference between the values of the means in the populations from which the samples were drawn (the two samples belong to two populations)

If the hypothesis test is about deciding, whether a population mean, μ, is different from the specified value μ0, the alternative hypothesis is expressed as:

\(H_A : \mu \neq \mu_0\)

Such a hypothesis test is called two-sided test.

The mean number of syllables will be DIFFERENT for children with atypical language development (\(\mu\)), when compared to the typical development children (\(\mu_0\)).

If the hypothesis test is about deciding, whether a population mean, μ, is less than the specified value μ0, the alternative hypothesis is expressed as

\(H_A : \mu < \mu_0\)

The mean number of syllables will be LOWER children with atypical language development (\(\mu\)), when compared to children with typical language development (\(\mu_0\)).

If the hypothesis test is about deciding, whether a population mean, μ, is greater than a specified value μ0, the alternative hypothesis is expressed as

The mean number of syllables will be HIGHER for the children with atypical language development (\(\mu\)), when compared to the typical development children (\(\mu_0\)).

\(H_A : \mu > \mu_0\)

Important: when we perform hypothesis testing we want to reject the null hypothesis.

Calculate a test statistic to find the probability of the results observed, on the assumption that the null hypothesis is true. The test statistic is a number calculated from the data set, which is obtained by measurements and observations, or more general by sampling.

Steps to do hypothesis testing

Formulate the null hypothesis and the alternative hypothesis.

There is no difference between the number of syllables’ means in the populations in which the two groups of scores were drawn.

\(H_0 : \mu = \mu_0\)

The mean number of syllables will be LOWER for the children with atypical language development than for the children with typical language development.

\(H_A : \mu < \mu_0\)

Selecting your significance level

We have to assume that there will always be a chance that the differences that we are observing are due to chance (sampling differences) and not due to a true difference brought by the independent variable.

We set a probability for our observed results to occur under the null hypothesis.

Significance level of 5% means that there is a 5% of probability of our observed difference to be a result of chance (different sampling). Researchers in the social sciences are normally comfortable with a 5% probability of having found their observed results by chance.

\(p < 0.05\)

Select a test statistic

A test statistic is a value describing the extent to which the research results differ from the null hypothesis. The test statistic is a hypothesis test that helps you determine whether to support or reject a null hypothesis in your study. You achieve this by using a test statistic to calculate the p-value of your results.

Two-tailed test

Learning in an immersion setting will result in different scores than learning in a classroom setting (it could be higher or lower).

\(H_0 : \mu_0 \neq \mu\)

We set the confidence level to 95% (learning in an immersion setting will result in DIFFERENT than learning in a classroom setting 95% of the time)
\(\alpha\) (p-value) is the significant level 0.05 (5%) and cuts off the two tails of the distribution, because the test statistic could have either positive or negative values. The critical region cuts an area of \(\alpha\)/2
19.6 and -1.96 are our critical values. If our test statistic is lower or higher than the critical values we REJECT the null hypothesis with 95% confidence. The difference between the two means is not likely to be by change at a significance level of 5%.

One-tailed test

Learning French in a classroom setting will result in lower scores than learning French in an immersion setting.

\(H_0 : \mu < \mu_0\)

We set the confidence level to 95% (learning in a classroom setting will result in higher results than learning in a classroom setting 95% of the time)
\(\alpha\) (p-value) is the significant level 0.05 (5%) and cuts off the two tails of the distribution, because the test statistic could have either positive or negative values. The critical region cuts an area of \(\alpha\)/2
The null hypothesis is rejected if the test statistic is too small.

Types of error in significance testing

When we reject or not reject the null hypothesis we do so using a significance level (5%), and this means that we still have some chance to reject the null hypothesis when the reality is that the null hypothesis is true, or to accept the null hypothesis when the reality is that the null hypothesis is false.

Type I error: The null hypothesis is rejected when it is actually true (false positive)

Type II error: The null rejected is not rejected when it is actually false (false negative)

Types of error

Choosing a test

Factors to decide which test to use:

Level of measurement concerned
Characteristics of the frequency distribution
Type of design used for the study

Level of measurement and characteristics of frequency distribution

Parametric tests: for ratio or interval levels of measurement.

Assumptions of parametric tests:

Data are normally distributed.
Some parametric tests, populations have equal variances.

Example: Difference between the mean scores of a group of students that learned statistics for 10 hours a week during 2 weeks, and a group of students that learned statistics for 2 hours a week during 20 weeks.

Non-parametric tests: for ranking, ordinal variables, and numeric variables that are not normally distributed.

Non-parametric tests are less powerful than parametric tests.

Example: Participants decide whether speech produced using a mask is intelligible or not (Likert scale from 1 to 7) and this is compared to speech produced without using a mask. Difference between the means of the intelligibiltiy judgement task.