ย 

Further statistics for Edexcel A-level Maths

Further statistics

This page covers the following topics:

1. Bivariate data
2. Regression lines
3. Simple, discrete probability distributions
4. Normal distribution
5. Statistical hypothesis testing

Bivariate data is data for two variables. Bivariate data is used to compare two sets of data and for finding a relationship between those two variables.

Bivariate data

A regression line is the line that best fits the data. If the scatterplot of bivariate data shows a linear pattern and the correlation between the variables is very strong, a regression line can be drawn through the scatter points. The method of least squares can be used to find the y-intercept of a regression line. Using this method makes the total of the square of the errors as small as possible, and thus returns the best regression line. The gradient and the y-intercept can be found using the following formulas by the least squares method: m = (Nฮฃ(xy) โˆ’ ฮฃxฮฃy)/(Nฮฃ(xยฒ) โˆ’ (ฮฃx)ยฒ), where N is the total number of points, and c = (ฮฃy โˆ’ mฮฃx)/N.

Regression lines

The discrete probability distribution of a random variable is the set of all possible outcomes it can take and its corresponding probabilities. For a discrete probability distribution, the probabilities must sum to one. One discrete probability distribution is the binomial distribution. The binomial distribution gives the probability of getting a number of successes in an experiment which is repeated for a number of trials. The probability for a binomial distribution is found using the following formula: P(X = x) = (nCx)pหฃ(1-p)^(n โˆ’ x), where p is the probability of success, n is the number of trials and x is the total number of successes. The binomial distribution when there are only 2 possible outcomes, those being a success or a failure.

Simple, discrete probability distributions

A normal distribution is a symmetrical bell curve. It is described by two parameters: the mean, which is the central maximum of the curve, and the standard deviation. The probability of a random variable that follows a normal distribution is found by calculating the area under the curve in the given range. For a given continuous random variable that follows a normal distribution, approximately 68% of all values lie within 1 standard deviation of the mean, 95% of all values lie within 2 standard deviations of the mean and approximately all of them lie within 3 standard deviations of the mean. When the parameters of the normal distribution are known, the calculator function for the normal distribution can be used to calculate probabilities.

Normal distribution

A hypothesis test is a process that examines a population parameter value proposed by the null hypothesis compared to the alternative hypothesis at a given significance level. The null hypothesis, Hโ‚€, is what is asssumed to be true unless proven otherwise. If there is significant enough evidence from the test statistic that is calculated using the given data, Hโ‚€ may be rejected in favour of the alternative hypothesis, Hโ‚. The critical region is the set of values for the test statistic for which the null hypothesis would be rejected in favour of the alternative hypothesis. A hypothesis test may either be one-tailed, which looks for either an increase or decrease of the parameter, or two-tailed, which looks for any change in the parameter. For a hypothesis test to be carried out, the probability of the observed test statistic coming from the given distribution is found. If this is smaller than the significance level, the null hypothesis is rejected.

Statistical hypothesis testing

1

Draw a distribution table for a random variable X which follows a discrete probability distribution, given that X takes a value of 2 with a probability of 0.14 and a value of 4 with a probability of 0.53 and the values of 1, 3 and 5 with the same probability.

Draw a distribution table for a random variable X which follows a discrete probability distribution, given that X takes a value of 2 with a probability of 0.14 and a value of 4 with a probability of 0.53 and the values of 1, 3 and 5 with the same probability.

2

A coin is tossed 50 times and 30 heads are achieved. Carry out a hypothesis test at the significance level 2.5% to check whether the coin is biased.

The number of heads is modelled by the following Binomial distribution: X ~ B(50, 0.5). Hโ‚€: p = 0.5, Hโ‚: p โ‰  0.5. Since this is a two-tailed test, the significance level in each tail is 0.0125. 25 heads are expected and 30 are received, therefore it must be checked whether the probability of getting heads is greater than 0.5. P(X โ‰ฅ 30) = 1 โˆ’ P(X โ‰ค 29) = 0.101 (to 3 significant figures). This is greater than the significance level, 0.101 > 0.0125, therefore there is insufficient evidence to reject Hโ‚€.

A coin is tossed 50 times and 30 heads are achieved. Carry out a hypothesis test at the significance level 2.5% to check whether the coin is biased.

3

A random variable can take 3 possible values. Could this random variable follow a binomial distribution?

A random variable that follows a binomial distribution can only take 2 possible values, ie. Success or failure, therefore this random variable cannot be from a binomial distribution.

A random variable can take 3 possible values. Could this random variable follow a binomial distribution?

4

Define the null and alternative hypotheses.

The null hypothesis, Hโ‚€, is what is asssumed to be true unless proven otherwise. The alternative hypothesis, Hโ‚, is what the null hypothesis is rejected in favour for if there is significant enough evidence from the test statistic.

Define the null and alternative hypotheses.

5

A scatter plot for two variables is plotted and it is found that the relationship between them follows a positive quadratic. Explain whether a regression line for the two variables should be drawn.

Since the scatterplot for the two variables does not follow a linear pattern, a regression line should not be plotted.

A scatter plot for two variables is plotted and it is found that the relationship between them follows a positive quadratic. Explain whether a regression line for the two variables should be drawn.

End of page

ย