[Bioperl-l] BPlite percent_id bug
Jerm
jerm@fugu-sg.org
Wed, 16 Jan 2002 11:51:50 +0800
I've noticed a bug in BPlite (the branch I'm using is Bioperl-072), when
the blast output file is parsed.
The percentage_ids are calculated by dividing the number of identical
matches with the query sequence length.
So for example,
-------------------------------------------------------------------------
---------# Plus Strand HSPs:
#
# Score = 247 (92.0 bits), Expect = 8.6e-88, Sum P(10) = 8.6e-88
# Identities = 48/64 (75%), Positives = 57/64 (89%), Frame = +3 / +1
#
#Query: 37125 LQTVICSYVFFQGFLNLKWSRFARVVLTRSIAIIPTLLVAVFQDVEHLTGMNDFLNVLQS
37304# L+ ++C QGFLNL+WSRFARV+LTRS+AI
PTLLVA+FQD++HLTGMNDFLNVLQS#Sbjct: 3520
LKVLVC----LQGFLNLRWSRFARVLLTRSLAITPTLLVAIFQDIQHLTGMNDFLNVLQS 3687#
#Query: 37305 LQVR 37316
# LQVR
#Sbjct: 3688 LQVR 3699
-------------------------------------------------------------------------
----------
The $match (48) is parsed out from the file, and is divided by the
$qlength (37316 - 37126 +1 = 191), and the perc_id for this HSP is then
48/191 = 25%
But this blast output is from a tblastx, that is to say, the qlength is in
NT, but the number of matches is in AA. The perc_id is obviously
incorrect.
Is there a reason why th perc_id is not parsed out from the file directly
(75%) instead?
Jer-Ming Chia
Fugu Informatics
Singapore