[Bioperl-l] interesting blastxml issue
Jason Stajich
jason@cgt.mc.duke.edu
Thu, 13 Dec 2001 14:06:03 -0500 (EST)
So when parsing NCBI blast output with blastx things are transformed in a
slightly different way in the xml vs the plain text output.
Normally we infer the strand of the query sequence in a blastx run by
looking at the start/end position and translating this to start always
less than end and updating the strand to -1 if start was > end.
However in the NCBI XML output from blastall (2.1.3) we get the following
[only relavent stuff shown from the actual blastx run]
<Hsp_query-from>400</Hsp_query-from>
<Hsp_query-to>621</Hsp_query-to>
<Hsp_query-frame>-3</Hsp_query-frame>
while in plain text we get (from bl2seq):
Score = 53.5 bits (127), Expect(2) = 3e-12
Identities = 27/74 (36%), Positives = 40/74 (53%)
Frame = -3
Yet if I look at the blast output in original blast text mode I get this:
Query: 621 YVVDSYANVAASAISAKNMTRSLIGASVPLWITQLFHNLGFQYGGLLLALVSVVXXXXXX 442
Y+++SY +AASA++A RS GA PL+ +F +G + GLLL L +
Sbjct: 508 YIIESYLLLAASAVAANTFMRSAFGACFPLFAGYMFRGMGIGWAGLLLGLFAAAMIPVPL 567
Query: 441 XXXYKGASVRKRSK 400
G S+RK+SK
Sbjct: 568 LFLKYGESIRKKSK 581
So... I've dealt with it with the following big of logic in
Bio::SearchIO::SearchEventResultBuilder:
top of method 'end_hsp'.
if( defined $data->{'queryframe'} && # this is here to protect from undefs
( ( $data->{'queryframe'} < 0 &&
$data->{'querystart'} < $data->{'queryend'} ) ||
$data->{'queryframe'} > 0 &&
( $data->{'querystart'} > $data->{'queryend'} ) )
)
{
# swap
($data->{'querystart'},
$data->{'queryend'}) = ($data->{'queryend'},
$data->{'querystart'});
}
if( defined $data->{'subjectframe'} && # this is here to protect from undefs
( (defined $data->{'subjectframe'} && $data->{'subjectframe'} < 0
&&
$data->{'subjectstart'} < $data->{'subjectend'} ) ||
defined $data->{'subjectframe'} && $data->{'subjectframe'} > 0 &&
( $data->{'subjectstart'} > $data->{'subjectend'} ) )
)
{
# swap
($data->{'subjectstart'},
$data->{'subjectend'}) = ($data->{'subjectend'},
$data->{'subjectstart'});
}
I'm going to commit it - but I wanted to throw it out there and explain
where this ugliness came from and see if anyone has issues with it.
--
Jason Stajich
Duke University
jason@cgt.mc.duke.edu