The Normal Distribution

Repeated measurements in biology are rarely identical, due to random errors and natural variation. If enough measurements are repeated they can be plotted on a histogram, like the one on the right. This usually shows a __normal distribution__, with most of the repeats close to some central value. Many biological phenomena follow this pattern: eg. peoples' heights, number of peas in a pod, the breathing rate of insects, etc.

The central value of the normal distribution curve is the __mean__ (also known as the __arithmetic mean__ or __average__). But how reliable is this mean? If the data are all close together, then the mean is probably good, but if they are scattered widely, then the calculated mean may not be very reliable. The reliability of the mean is given by the** **__95% confidence interval__** **(also known as the __confidence limit__). This is derived from the __standard deviation__, and is the range above and below the mean within which 95% of the repeated measurements lie (marked on the histogram above). You can be pretty confident that the real mean lies somewhere in this range. Whenever you calculate a mean you should also calculate a confidence limit to indicate the quality of your data.

In Excel the mean is calculated using the formula =AVERAGE (range) , and the 95% confidence interval is calculated using =CONFIDENCE (0.05, STDEV(range), COUNT(range)) . These are both shown in the spreadsheet below.

This spreadsheet shows two sets of data with the same mean. In group A the confidence limit is small compared to the mean, so the data are reliable and you can be confident that the real mean is close to your calculated mean. But in group B the confidence limit is large compared to the mean, so the data are unreliable, as the real mean could be quite far away from your calculated mean.

The Equations

Mean

where
and S means sum of. |

95% Confidence Interval

and |
where
and |