Bioperl: Another problem, this one uglier.
Michael B. Thornton
lost@sea.incyte.com
Thu, 18 Nov 1999 15:32:21 -0800
Hello All,
We found this same problem in our hands. I think that it comes from using
the ">" character as a delimiter. Reports with no hits do not have the ">"
character and so all get strung together.
We modified Blast.pm sub _get_parse_blast_func thusly......
----------------------------------------------------------------------------
----
if($data =~ m/Database:\s+(.+?)$Newline/so ) {
$current_db = $1;
} else {
# In some reports, the Database is only listed at end.
#$Blast->warn("Can't determine database name from BLAST report.");
}
}
# Incyte_Fix: Nasty Invisible Bug.
# Records in blast report are delimited by '>', but... when
# there are no hits for a query, there won't be a '>'. That
# causes several blast reports to run together in the data
# passed to this routine. Need to get rid of non-hits in data
if ($data =~ /.+(No hits? found.+Sequences.+)/so) {
$data = $1;
}
# End Incyte_Fix
# Determine if we need to create a new Blast object
# or use the $self object for this method.
if($Blast->{'_multi_stream'} or $self->name eq 'Static Blast object') {
----------------------------------------------------------------------------
-----
I don't know if this is the best or right way to finx this, but it seems to
make the problem go away.
hope this helps
cheers,
ok
mbt
----- Original Message -----
From: carl virtanen <carl@cimmed.com>
To: <vsns-bcd-perl@lists.uni-bielefeld.de>
Sent: Wednesday, November 17, 1999 9:23 PM
Subject: Bioperl: Another problem, this one uglier.
> Hi.
> Still haven't figured out what was happening in the last problem i had,
but i
> managed a sort of unelegant work around.
> But here's something i don't understand. I have a bunch of blast reports
> (2.0.10) in a single file. Some are hits, some are misses. Now, i use the
> example program thusly:
>
> cat myblast.reports | perl parse_blast.pl -table 1
>
> everything works, and it has extracted a nice looking table of all the
hits in
> my blast file. Only one problem.... The query names DO NOT match with the
> correct subjects (ie-sequence identifiers-column 3). Actually, this
problem is
> related to my earlier problem i think (actually, i know it is). Say for
> example you have a blast file with reports like this:
> Blast 1 report->has hits
> Blast 2 report->no hits
> Blast 3 report->no hits
> Blast 4 report-> has hits
>
> Now, the BioBlast picks up the first hits. Then it skips the next 2
reports (no
> hits) and picks up the Blast 4 report. Ok. Only problem, is that the query
name
> reported in the table is picked up off of Blast 2 and NOT Blast 4, which
is the
> correct query name for that set of hits. This is a major problem. At
least for
> me! If i'm doing something wrong here in my parsing, let me know. Better
yet,
> let everybody know EXACTLY what must be done. The worst thing is to get
results,
> that look ok but which are wrong.
>
> Carl
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================