[Bioperl-l] SearchIO-BLAST
Chris Fields
cjfields at uiuc.edu
Tue Aug 21 21:53:44 UTC 2007
I can confirm this (I'm using bioperl-live). The output I get is:
Num of hits: 9
ref|NM_000039.1|
ref|NT_113960.1|Hs22_111679
ref|NT_033899.7|Hs11_34054
ref|NW_925173.1|HsCraAADB02_444
ref|NM_000039.1|
ref|NT_113960.1|Hs22_111679
ref|NT_033899.7|Hs11_34054
ref|NW_925173.1|HsCraAADB02_444
ref|NW_925173.1|HsCraAADB02_444
The extra '>' is definitely throwing the event calls for a loop; the
2x increase is b/c an extra iteration is started when '>' is
encountered (changing the event handler reduces the number to 5).
The extra hit is from the '>' at the beginning.
I hate to say it, but this is an instance where we can't be more
flexible, primarily b/c '>' is a legit token the parser looks for (it
is the beginning of the hit block in reports). Finding it as the
initial token in the report is also legitimate for some older BLAST
output, so we also can't simply bypass it. You'll unfortunately have
to preparse the reports to get rid of those lines prior to feeding
them to the BLAST text report parser.
chris
On Aug 21, 2007, at 11:32 AM, Bernd Web wrote:
> Dear all,
>
> Recently, I stumbled on something with parsing BLAST reports. To a
> plain text blast report from NCBI a ">aaa" got prepended. This
> (fasta-like header) changes the $result->hits array.
> The amount of hits is now 2*num_hits + 1. Clearly, this is related to
> faulty input, but still the effect of this line is great. Does someone
> see what is causing this, and should the BLAST parser maybe be
> slightly more relaxed wrt pre/appended text? I have not seen yet why
> this extra fastaheader line has such a "large" effect.
>
> A short example BLASTN output is attached.
> Example code is:
>
> use Bio::SearchIO;
> my $in = new Bio::SearchIO(-format => 'blast',
> -file => 'apoe_plain.bls');
> while( my $result = $in->next_result ) {
> print "Num of hits: ", $result->num_hits, "\n";
> my @hits = $result->hits;
> foreach my $el (@hits) {
> print $el->name, "\n";
> }
>
>
> Kind regards,
> Bernd
> <apoe_plain.bls>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list