Statistics::Gtest version 0.03
==============================
NAME
Statistics::Gtest - calculate G-statistic for tabular data
SYNOPSIS
use Statistics::Gtest;
$gt = Statistics::Gtest->new($data);
$degreesOfFreedom = $gt->getDF();
$gstat = $gt->getG();
$gt->setExpected($expectedvalues);
$uncorrectedG = $gt->getRawG();
DESCRIPTION
"Statistics::Gtest" is a class that calculates the G-statistic for
goodness of fit for frequency data. It can be used on simple frequency
distributions (1-way tables) or for analyses of independence (2-way
tables).
Note that "Statistics::Gtest" will not, by itself, perform the
significance test for you -- it just provides the G-statistic that can
then be compared with the chi-square distribution to determine
significance.
OVERVIEW and EXAMPLES
A goodness of fit test attempts to determine if an observed frequency
distribution differs significantly from a hypothesized frequency
distribution. From "Statistics::Gtest"'s point of view, these tests come
in two flavors: 1-way tests (where a single frequency distribution is
tested against an expected distribution) and 2-way tests (where a matrix
of observed values is tested for independence -- that is, the lack of
interaction effects among the two axes being measured).
A simple example might help here. You've grown 160 plants from seed
produced by a single parent plant. You observe that among the offspring
plants, some have spiny leaves, some have hairy leaves, and some have
smooth leaves. What is the likelihood that the distribution of this
trait follows the expected values for simple Mendelian inheritance?
Observed values:
Spiny Hairy Smooth
95 53 12
Expected values (for a 9:3:3:1 ratio):
90 60 10
If the observed and expected values are put into two files,
"Statistics::Gtest" can create a G-statistic object that will calculate
the likelihood that the observed distribution is significantly different
from the distribution that would be expected by simple inheritance. (The
value of G for this comparison is approximately 1.495, with 2 degrees of
freedom; the observed results are not significantly different from
expected at the .05 -- or even .1 level.)
2-way tests will usually not need a table of expected values, as the
expected values are generated from the observed value sums. However, one
can be loaded for 2-way tables as well.
To determine if the calculated G statistic indicates a statistically
significant result, you will need to look up the values in a chi-square
distribution on your own, or make use of the "Statistics::Distributions"
module:
use Statistics::Gtest;
use Statistics::Distributions;
...
my $gt = Statistics::Gtest->new($data);
my $df = $gt->getDF();
my $g = $gt->getG();
my $sig = '.05';
my $chis=Statistics::Distributions::chisqrdistr ($df,$sv);
if ($g > $chis) {
print "$g: Sig. at the $sv level. ($chis cutoff)\n"
}
By default, "Statistics::Gtest" returns a G statistic that has been
modified by William's correction (Williams 1976). This correction
reduces the value of G for smaller sample sizes, and has progressively
less effect as the sample size increases. The raw, uncorrected G
statistic is also available.
Calculation methods based on Sokal, R.R., and F.J. Rohlf, Biometry.
1981. W.H. Freeman and Company, San Francisco.
Williams, D.A. 1976. Improved likelihood ratio test for complete
contingency tables. Biometrika, 63:33 - 37.
INSTALLATION
To install this module type the following:
perl Makefile.PL ARGS (see the ExtUtils::MakeMaker documentation for
possible arguments)
make
make test
make install
DEPENDENCIES
IO::File
COPYRIGHT AND LICENCE
Copyright (C) 2007 by David Fleck
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.4 or,
at your option, any later version of Perl 5 you may have available.