[Bioperl-l] Fwd: BLAST parsing broken
Razi Khaja
razi.khaja at gmail.com
Sun May 9 21:15:38 UTC 2010
Hi Chris,
The patch is against the main trunk. I checked out version 11326 of the
repository today.
Razi
On Sun, May 9, 2010 at 4:43 PM, Chris Fields <cjfields at illinois.edu> wrote:
> If the patch is against main trunk it isn't a problem, otherwise the diff
> should be vs. that code.
>
> chris
>
> On May 9, 2010, at 2:23 PM, Razi Khaja wrote:
>
> > Attached (blast.pm.diff) is a patch that fixes Heikki's problem.
> > Can someone advise an appropriate way to have this patch applied, given
> that
> > it is an amendment to a previous patch?
> > Thanks
> > Razi
> >
> >
> > ---------- Forwarded message ----------
> > From: Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>
> > Date: Wed, May 5, 2010 at 2:11 AM
> > Subject: Re: [Bioperl-l] BLAST parsing broken
> > To: Razi Khaja <razi.khaja at gmail.com>
> >
> >
> > Hi Raja,
> >
> > Thanks for trying to fix this.
> >
> > I am attaching an example output file to this message. I just tested
> again
> > that master from git repository fails to get query ID, but the previous
> > version works.
> >
> > bala ~/src/bioperl-live> git checkout master
> > Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp
> > output
> > Switched to branch 'master'
> >
> > When I started using the latest mpiBLAST code a few months ago I did
> compare
> > the 0 output from it to standard NCBI blast and they were identical.
> >
> >
> >
> >
> > Also, I've noticed a discrepancy between within bioperl blast parsing
> that
> > I have not had time to work on. Would you be interested in having a look?
> >
> > I am creating output from mpiBLAST in 0 format and then converting it
> into
> > tab-delimited 8 format. I am unable to get 100% similarity for all cases
> > when I compare the conversion to the output straight from mpiBLAST in
> format
> > 8. Sometimes the mismatch and gap values are off by one.
> >
> > I am attaching a script that does the conversion. It is the same one I
> was
> > using when I noticed the problem above. I was going to put the code into
> > bioperl but that got delayed when I noticed the discrepancies.
> >
> >
> > Cheers,
> >
> >
> > -Heikki
> >
> > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> > cell: +966 545 595 849 office: +966 2 808 2429
> >
> > Computational Bioscience Research Centre (CBRC), Building #2, Office
> #4216
> > 4700 King Abdullah University of Science and Technology (KAUST)
> > Thuwal 23955-6900, Kingdom of Saudi Arabia
> >
> >
> >
> > On 4 May 2010 20:55, Razi Khaja <razi.khaja at gmail.com> wrote:
> >
> >> That is odd. Heikki, do you have a blast output file that produces this
> >> error?
> >> Could you attach the file and either send to the list or myself (if the
> >> list
> >> does not accept attachments).
> >> Thanks,
> >> Razi
> >>
> >>
> >> On Mon, May 3, 2010 at 8:08 AM, Chris Fields <cjfields at illinois.edu>
> >> wrote:
> >>
> >>> Odd, I ran tests on that prior to commit. I'll work on fixing that (in
> >> svn
> >>> of course, until the migration is complete).
> >>>
> >>> chris
> >>>
> >>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote:
> >>>
> >>>> Chris,
> >>>>
> >>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of
> >> normal
> >>>> blast output. $result->query_name returns now undef.
> >>>>
> >>>> (Using the anonymous git now). This change still works:
> >>>>
> >>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
> >>>> Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
> >>>> Date: Sun Dec 20 04:39:58 2009 +0000
> >>>>
> >>>> Robson's patch for buggy blastpgp output
> >>>>
> >>>> But this does not:
> >>>>
> >>>> commit 9a89c3434597104dd50553e3562983d78d14a544
> >>>> Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
> >>>> Date: Thu Apr 15 04:21:17 2010 +0000
> >>>>
> >>>> [bug 3031]
> >>>>
> >>>> patches for catching algorithm ref, courtesy Razi Khaja.
> >>>>
> >>>> That makes it easy to find the diffs:
> >>>>
> >>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
> >>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm
> >>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm
> >>>> index 378023a..6f7eeeb 100644
> >>>> --- a/Bio/SearchIO/blast.pm
> >>>> +++ b/Bio/SearchIO/blast.pm
> >>>> @@ -209,6 +209,7 @@ BEGIN {
> >>>>
> >>>> 'BlastOutput_program' => 'RESULT-algorithm_name',
> >>>> 'BlastOutput_version' =>
> >> 'RESULT-algorithm_version',
> >>>> + 'BlastOutput_algorithm-reference' =>
> >>> 'RESULT-algorithm_reference',
> >>>> 'BlastOutput_query-def' => 'RESULT-query_name',
> >>>> 'BlastOutput_query-len' => 'RESULT-query_length',
> >>>> 'BlastOutput_query-acc' => 'RESULT-query_accession',
> >>>> @@ -504,6 +505,26 @@ sub next_result {
> >>>> }
> >>>> );
> >>>> }
> >>>> + # parse the BLAST algorithm reference
> >>>> + elsif(/^Reference:\s+(.*)$/) {
> >>>> + # want to preserve newlines for the BLAST algorithm
> >>> reference
> >>>> + my $algorithm_reference = "$1\n";
> >>>> + $_ = $self->_readline;
> >>>> + # while the current line, does not match an empty line, a
> >>> RID:,
> >>>> or a Database:, we are still looking at the
> >>>> + # algorithm_reference, append it to what we parsed so far
> >>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~ /^Database:/)
> {
> >>>> + $algorithm_reference .= "$_";
> >>>> + $_ = $self->_readline;
> >>>> + }
> >>>> + # if we exited the while loop, we saw an empty line, a
> >> RID:,
> >>> or
> >>>> a Database:, so push it back
> >>>> + $self->_pushback($_);
> >>>> + $self->element(
> >>>> + {
> >>>> + 'Name' => 'BlastOutput_algorithm-reference',
> >>>> + 'Data' => $algorithm_reference
> >>>> + }
> >>>> + );
> >>>> + }
> >>>> # added Windows workaround for bug 1985
> >>>> elsif (/^(Searching|Results from round)/) {
> >>>> next unless $1 =~ /Results from round/;
> >>>>
> >>>>
> >>>> I am not sure why reference parsing messes things up. Maybe it eats
> too
> >>> many
> >>>> lines from the result file.
> >>>>
> >>>> Yours,
> >>>>
> >>>> -Heikki
> >>>>
> >>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> >>>> cell: +966 545 595 849 office: +966 2 808 2429
> >>>>
> >>>> Computational Bioscience Research Centre (CBRC), Building #2, Office
> >>> #4216
> >>>> 4700 King Abdullah University of Science and Technology (KAUST)
> >>>> Thuwal 23955-6900, Kingdom of Saudi Arabia
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > <mpiblast.out><blastparser028.pl
> ><blast.pm.diff>_______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list