NAME
Set::Similarity - similarity measures for sets
SYNOPSIS
use Set::Similarity::Dice;
# object method
my $dice = Set::Similarity::Dice->new;
my $similarity = $dice->similarity('Photographer','Fotograf');
# class method
my $dice = 'Set::Similarity::Dice';
my $similarity = $dice->similarity('Photographer','Fotograf');
# from 2-grams
my $width = 2;
my $similarity = $dice->similarity('Photographer','Fotograf',$width);
# from arrayref of tokens
my $similarity = $dice->similarity(['a','b'],['b']);
# from hashref of features
# from hashref sets
DESCRIPTION
Overlap coefficient
( A intersect B ) / min(A,B)
Jaccard Index
The Jaccard coefficient measures similarity between sample sets, and is
defined as the size of the intersection divided by the size of the union
of the sample sets
( A intersect B ) / (A union B)
The Tanimoto coefficient is the ratio of the number of features common
to both molecules to the total number of features, i.e.
( A intersect B ) / ( A + B - ( A intersect B ) ) # the same as Jaccard
The range is 0 to 1 inclusive.
Dice coefficient
The Dice coefficient is the number of features in common to both
molecules relative to the average size of the total number of features
present, i.e.
( A intersect B ) / 0.5 ( A + B ) # the same as sorensen
The weighting factor comes from the 0.5 in the denominator. The range is
0 to 1.
METHODS
new
$object = Set::Similarity->new();
similarity
my $similarity = $object->similarity('a','b');
from_tokens
my $similarity = $object->from_tokens(['a'],['b']);
from_sets
my $similarity = $object->from_sets({'a' => undef},{'b' => undef});
intersection
my $intersection_size = $object->intersection({'a' => undef},{'b' => undef});
combined_length
my $set_size_sum = $object->combined_length({'a' => undef},{'b' => undef});
min
my $min_set_size = $object->min({'a' => undef},{'b' => undef});
ngrams
my $bigrams = $object->ngrams('abc',2);
SOURCE REPOSITORY
AUTHOR
Helmut Wollmersdorfer,
COPYRIGHT AND LICENSE
Copyright (C) 2013 by Helmut Wollmersdorfer
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.