[Bioperl-l] Interpretation of percentage_idendity
Jason Stajich
jason at bioperl.org
Fri Apr 7 13:50:30 UTC 2006
These methods are really more for multiple sequence alignment than
pairwise identities. Although I guess we don't have anywhere else
that calculates percent ID for a pair of sequences in an alignment -
would be nice for someone to add that.
First off, percentage_identity is an alias for
average_percentage_identity - this has to do with preserving the
function names that existed before there where two methods.
There are only really two implementations to concentrate on.
average_percentage_identity
overall_percentage_identity
The documentation for Bio::SimpleAlign gives you some hints about how
each works
overall is just the overall number of columns that are identical so
it is very conservative.
Here is the pertinent documentation for average_percent_identity
Function: The function uses a fast method to calculate the average
percentage identity of the alignment
Notes : This method implemented by Kevin Howe calculates a figure
that is
designed to be similar to the average pairwise identity
of the
alignment (identical in the absence of gaps), without
having to
explicitly calculate pairwise identities proposed by
Richard Durbin.
Validated by Ewan Birney ad Alex Bateman.
If someone wants to put some except of this on the SimpleAlign wiki
page that would be awesome.
-jason
On Apr 7, 2006, at 10:04 AM, Armin Schmitt wrote:
> Dear Jason,
>
> I need some help with the interpretation of
> the results from all three percentage_identity
> variants offered in the Bioperl module AlignI.pm
>
> - percentage_identity
> - average_percentage_identity
> - overall_percentage_identity
>
> Please understand that I am not a Perl expert,
> so I am not able to get the meaning from the
> source code.
>
> By percentage identity for a 2 sequence alignment
> I undertand the proportion of matching amino acids
> of the total length.
>
> But I suspect that this is different now?
>
> Thank you very much
>
> Armin Schmitt
>
> --
> Dr. Armin Schmitt
> Züchtungsbiologie und molekulare Genetik
> Institut für Nutztierwissenschaften
> Humboldt-Universität zu Berlin
> Invalidenstraße 42
> 10115 Berlin
>
> Breeding Biology and Molecular Genetics
> Institute for Animal Sciences
> Humboldt-Universität zu Berlin
> Invalidenstraße 42
> 10115 Berlin
> Germany
>
> Tel: +49 30 2093 9074
> Fax: +49 30 2093 6397
> http://www.agrar.hu-berlin.de/nutztier/zb/
>
>
--
Jason Stajich
Duke University
http://www.duke.edu/~jes12
More information about the Bioperl-l
mailing list