RaceRocks.com
Videocams Ecosystems History First nations Sponsors
Management Home
BIOSTATISTICS LAB:
STANDARD DEVIATION and T- TEST
Tidal Energy Project
Weather
Live Video
Archives
Technology
BACKGROUND:

For this field lab we will go to Race Rocks,( Weather Permitting) at a low tide and choose a population of molluscs or arthropods ( Barnacles) which we can use without damaging to gather appropriate information for the following exercise. After observation of the populations, you will be able to make relevant hypothesis about the members of the two populations , we will decide on appropriate measurements that can be made and then pairs of students will then proceed to gather the data . Submit your data and the results of the analysis in an excel table saved in html so that it can be added to the published information on the website.

For a very good reference on Ecological Sampling and Statistics , you can start by reviewing the information presented on the following website:

http://www.countrysideinfo.co.uk/biol_sampl_cont.htm

RESULTS:

RAW DATA TABLES THAT YOU MAY USE FOR STATISTICAL MANIPULATIONS. YOU CAN COPY AND PASTE THESE EXCEL TABLES FROM THE HTML PAGE ON THE INTERNET RIGHT TO YOUR OWN DESKTOP INTO AN EXCEL WORKBOOK.
April 2002 -- Californianus mussel diameters- from High and Low tide areas from Swordfish Island. April 2002 -- Gooseneck Barnacle clump diameter from High and low tide areas at Swordfish Island.
April 2002 -- Thatched barnacle height from the Protected versus the exposed shore at Race Rocks.
SOME QUESTIONS and POINTS TO GUIDE FURTHER INVESTIGATION:

1. State that the term standard deviation is used to summarize the spread of variables around the mean and that 68% of the values fall within one standard deviation of the mean (plus and minus).

2 Calculate the means and standard deviation of two different frequency distributions. Describe how standard deviation is useful in comparing the means and the spread of ecological data between two or more sites.

3. For normally distributed data about 68% of all values lie with the range of the mean plus or minus one standard deviation (s.d or s or o). This rises to about 95% for plus or minus two standard deviations. A small s.d. indicates that the data is clustered closely around the mean value. Conversely a large s.d. indicates a wider spread around the mean. The size of a s.d. might be the result of genetic or environmental factors (or both). When comparing two samples from two different populations the closer the means and s.d the more likely the samples are drawn from a similar (the same) population. 'the bigger the difference the less likely that this is so. This is dependent on sample size . Larger samples make more reliable results.

4. Comparing two samples of fictitious populations: population A has three plants 10 cm, 20 cm and 30 cm high respectively, and population B has three plants all 20 cm high - when compared, the means are identical but the standard deviations are widely different. - 'This could reinforce the point that means 'do not tell the whole story' and stress the importance of standard deviation. It might also be used to show that very small samples are unreliable!

5. Describe how standard deviation is useful in comparing the means and the spread of ecological data between two or more sites.

THE SIMPSON'S DIVERSITY INDEX
Simpson's Diversity Index is a measure of diversity. In ecology, it is often used to quantify the biodiversity of a habitat. It takes into account the number of species present, as well as the abundance of each species.

THE T- TEST

6.Analyze the significance of the difference between two sets of data using the Students' t-test given the appropriate formula and tables

7. Large values of t indicates little overlap and almost certainly a difference between two sets of data. In contrast, a small value of t indicates a lot of overlap and probably no difference. A probability of 0.05 is regarded as significant and a critical value read off from a table

8. The t-test should only be used on normally distributed data, ideally with large samples (>30 measurements per set of data) and the value of t should be compared with the critical value at # degrees of freedom. For sample sizes <30 the value of t is only approximate and the degrees of freedom is n1+ n2 - 2. If t > critical value then it is possible to reject the null hypothesis.

*Descriptive Statistics

If you were to measure the size of 10 mollusc shells you would find that mollusc shells in fact come in different sizes. Thus it is impossible to report the size of mollusc shells, instead the best you can do is to report a typical size and give some estimate of the range of variation above and below that typical size. The attempt to capture the full meaning of "the size of mollusc shells" in a few numbers is bound to fail -- Nature really is more complex than our descriptions of it. Nevertheless if our choice is to be silent on "the size of mollusc shells" or to provide a list of the size of every mollusc shell in the world (on this day) or to provide a few summarizing numbers, the latter is the option selected by science. This page introduces a handful of statistics which are commonly used to describe the distribution of data.

Typical Values

There are several common methods of selecting a "typical" value for data. The most common method is the average or mean. To obtain an average value, add up all your data values and divide by the number of data items. If X01 is the length of your first mollusc shell, X02 the length of your second mollusc shell, etc., then the average mollusc shell length is:

(X01+X02+X03+ X04+X05+X06+ X07+X08+X09+ X10)/10 = Xavg

The mean is one of several indices of central tendency that statisticians use to indicate the point on the scale of measures where the population is centered.

To obtain the median value, first sort your list of shell-lengths from lowest to highest:

{5.1, 7.2, 4.1, 9.5, 6.7, 7.8, 8.5, 7.0, 7.3, 9.0} becomes:

{4.1, 5.1, 6.7, 7.0, 7.2, 7.3, 7.8, 8.5, 9.0, 9.5}

and then select the value in the exact middle as the median. (It turns out that if the number of items is even, as in this example, there is no exact middle. 7.2 is 5 places from the front and 6 places from the back; 7.3 is 6 places from the front and 5 places from the back. So with even-numbered data sets, average the two near-middle values, producing Xmed=7.25 in this example.)

The median is one of several indices of central tendency that statisticians use to indicate the point on the scale of measures where the population is centered. The median of a population is the point that divides the distribution of scores in half. Numerically, half of the scores in a population will have values that are equal to or larger than the median and half will have values that are equal to or smaller than the median.

The mode "typical" value will be of less use to us: it is the most repeated value in the data set. In the above example, no value is repeated (each value occurs exactly once). This is commonly the case with so few data items; hence its limited utility for us.

Estimates of the Range of Variation

In some sense, the range of variation is limited only by your willingness to search through ever larger groups of shells. Generally, the more data you record the more extreme your highs and lows will be. Nevertheless, you should find that the range of shell lengths, that includes say 50% of your sample, remains about the same even if you look through ever larger groups of shells. That is to say, there is a common range of variation even as larger data sets produce rare "outliers" with ever more extreme deviation. Estimates of the range of variation seek to put a number to this common range of variation that doesn't depend on sample size.

The most common way to describe the range of variation is standard deviation (usually denoted by the Greek letter sigma: ). The standard deviation is simply the square root of the variance, so lets start by describing the variance. To obtain the variance start by subtracting the average from each data item. Since there will be about as many items above average as below average, the resulting list of numbers will have about as many positive values as negative values. (In fact this list of deviations-from-average must itself average to zero!) Square each deviation, and proceed to find the average of the squared-deviations. However, in finding the average squared-deviation, divide by N-1 rather than N. The result is the variance; take its square root to get the standard deviation.

variance = ( (X01-Xavg)2 + (X02-Xavg)2 + (X03-Xavg)2 + ··· + (X10-Xavg)2 )/9

For data that is "normally distributed" we expect that about 68.3% of the data will be within 1 standard deviation of the mean (i.e., in the range Xavg ± sdev ). In general there is a relationship between the fraction of the included data and the deviation from the mean in terms of standard deviations.

Fraction    Number of Standard 
of Data    Deviations from Mean

 50.0%           .674
 68.3           1.000
 90.0           1.645
 95.0           1.960
 95.4           2.000
 98.0           2.326
 99.0           2.576
 99.7           3.000

Thus we should expect that 95% of the data would be within 1.96 standard deviations of the mean (i.e., in the range Xavg ± 1.96 sdev ). This is called a 95% confidence interval for the sample.

Standard Deviation of the Estimated Means

The above procedure describes how to define a "typical" shell using 10 sample shells. Clearly if another group uses the same procedure on its own sample of 10 shells, it is unlikely to come up with exactly the same value for a "typical" shell. How much variation is there in the estimates of "typical" described above? Clearly if we expand the sample beyond 10 (to 100, or 1000, ...) we would expect to come closer to the actual "typical" shell (i.e., that determined by looking at all the shells in the world). Thus the larger the sample you average over, the smaller is your expected deviation from the exact result. But how much variation should you expect in a calculated average shell? The standard deviation expected in a calculated average is:

sdev/N1/2

Thus the deviations expected equal the standard deviation of the length of shells if you "average" over just one shell, and decrease as the square root of N as N increases. Thus one can expect to get quite close to the exact mean if the sample size N gets very big.

The standard deviation is one of several indices of variability that statisticians use to characterize the dispersion among the measures in a given population.

To calculate the standard deviation of a population it is first necessary to calculate that population's variance. Numerically, the standard deviation is the square root of the variance. Unlike the variance, which is a somewhat abstract measure of variability, the standard deviation can be readily conceptualized as a distance along the scale of measurement.
"Normal" and other Distributions

Many pages have been written by others on this topic. To be brief, a common assumption of statistics-users is that data is "normally" distributed. Occasionally the folks making this assumption know what it means and even test to see if it's a valid assumption. I'm going to leave you in the dark (like many statistics-users) about what this assumption means and how you test it. There are several good courses and books that would include these topics. I will give you two (not very helpful) hints.

  1. (Bad News) Many things in nature are not "normally" distributed. (Good News) Much of what is not "normally" distributed in biology would be "normally" distributed if you took the logarithm of each data item. Thus there is a button on the descriptive statistics calculation page to do this conversion for you. The result is that the geometric mean is calculated for you and a different kind of standard deviation is produced. With the usual standard deviation you add or subtract the standard deviation from the mean in order to test for fractions of included data; with the log standard deviation, you multiply or divide. Thus you would expect 68.3% of your data to be between Xgeo× sdev and Xgeo÷ sdev ; 95.4% of your data would be between Xgeo× sdev2 and Xgeo÷ sdev2
  2. (Bad News) Much of what's in books about statistics has to do with "normally" distributed data. Statistics that provide useful information even if applied to not-"normally" distributed data are call robust statistics. Median and average deviation are considered robust statistics. (Good News) The program always calculates them for you.
Click here for other statistical definitions
http://www.animatedsoftware.com/statglos/statglos.htm

Click here to calculate mean, standard deviation, etc
http://www.physics.csbsju.edu/stats/descriptive2.html

Click here to calculate using copy & paste data entry

(adapted from IB Biology Syllabus *and http://www.physics.csbsju.edu/stats/Index.html

racerocks.com home page
Sitemap Contact
webmaster:
Garry Fletcher
Copyright