[Bioperl-l] Fwd: BLAST parsing broken
Razi Khaja
razi.khaja at gmail.com
Sun May 9 23:48:28 UTC 2010
I checked out bioperl-live from github:
svn checkout http://svn.github.com/bioperl/bioperl-live.git
I just checked it out again, a few seconds ago and by default I got revision
11326.
Razi
On Sun, May 9, 2010 at 5:30 PM, Chris Fields <cjfields at illinois.edu> wrote:
> Then something is wrong, as current trunk is at r16969. Where are you
> pulling your code from? Our only working anon. server is the sync'ed github
> one.
>
> chris
>
> On May 9, 2010, at 4:15 PM, Razi Khaja wrote:
>
> > Hi Chris,
> > The patch is against the main trunk. I checked out version 11326 of the
> > repository today.
> > Razi
> >
> >
> > On Sun, May 9, 2010 at 4:43 PM, Chris Fields <cjfields at illinois.edu>
> wrote:
> >
> >> If the patch is against main trunk it isn't a problem, otherwise the
> diff
> >> should be vs. that code.
> >>
> >> chris
> >>
> >> On May 9, 2010, at 2:23 PM, Razi Khaja wrote:
> >>
> >>> Attached (blast.pm.diff) is a patch that fixes Heikki's problem.
> >>> Can someone advise an appropriate way to have this patch applied, given
> >> that
> >>> it is an amendment to a previous patch?
> >>> Thanks
> >>> Razi
> >>>
> >>>
> >>> ---------- Forwarded message ----------
> >>> From: Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>
> >>> Date: Wed, May 5, 2010 at 2:11 AM
> >>> Subject: Re: [Bioperl-l] BLAST parsing broken
> >>> To: Razi Khaja <razi.khaja at gmail.com>
> >>>
> >>>
> >>> Hi Raja,
> >>>
> >>> Thanks for trying to fix this.
> >>>
> >>> I am attaching an example output file to this message. I just tested
> >> again
> >>> that master from git repository fails to get query ID, but the previous
> >>> version works.
> >>>
> >>> bala ~/src/bioperl-live> git checkout master
> >>> Previous HEAD position was 5e278f5... Robson's patch for buggy blastpgp
> >>> output
> >>> Switched to branch 'master'
> >>>
> >>> When I started using the latest mpiBLAST code a few months ago I did
> >> compare
> >>> the 0 output from it to standard NCBI blast and they were identical.
> >>>
> >>>
> >>>
> >>>
> >>> Also, I've noticed a discrepancy between within bioperl blast parsing
> >> that
> >>> I have not had time to work on. Would you be interested in having a
> look?
> >>>
> >>> I am creating output from mpiBLAST in 0 format and then converting it
> >> into
> >>> tab-delimited 8 format. I am unable to get 100% similarity for all
> cases
> >>> when I compare the conversion to the output straight from mpiBLAST in
> >> format
> >>> 8. Sometimes the mismatch and gap values are off by one.
> >>>
> >>> I am attaching a script that does the conversion. It is the same one I
> >> was
> >>> using when I noticed the problem above. I was going to put the code
> into
> >>> bioperl but that got delayed when I noticed the discrepancies.
> >>>
> >>>
> >>> Cheers,
> >>>
> >>>
> >>> -Heikki
> >>>
> >>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> >>> cell: +966 545 595 849 office: +966 2 808 2429
> >>>
> >>> Computational Bioscience Research Centre (CBRC), Building #2, Office
> >> #4216
> >>> 4700 King Abdullah University of Science and Technology (KAUST)
> >>> Thuwal 23955-6900, Kingdom of Saudi Arabia
> >>>
> >>>
> >>>
> >>> On 4 May 2010 20:55, Razi Khaja <razi.khaja at gmail.com> wrote:
> >>>
> >>>> That is odd. Heikki, do you have a blast output file that produces
> this
> >>>> error?
> >>>> Could you attach the file and either send to the list or myself (if
> the
> >>>> list
> >>>> does not accept attachments).
> >>>> Thanks,
> >>>> Razi
> >>>>
> >>>>
> >>>> On Mon, May 3, 2010 at 8:08 AM, Chris Fields <cjfields at illinois.edu>
> >>>> wrote:
> >>>>
> >>>>> Odd, I ran tests on that prior to commit. I'll work on fixing that
> (in
> >>>> svn
> >>>>> of course, until the migration is complete).
> >>>>>
> >>>>> chris
> >>>>>
> >>>>> On May 3, 2010, at 6:45 AM, Heikki Lehvaslaiho wrote:
> >>>>>
> >>>>>> Chris,
> >>>>>>
> >>>>>> latest additions to Bio::SearchIO::blast.pm broke the parsing of
> >>>> normal
> >>>>>> blast output. $result->query_name returns now undef.
> >>>>>>
> >>>>>> (Using the anonymous git now). This change still works:
> >>>>>>
> >>>>>> commit 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
> >>>>>> Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
> >>>>>> Date: Sun Dec 20 04:39:58 2009 +0000
> >>>>>>
> >>>>>> Robson's patch for buggy blastpgp output
> >>>>>>
> >>>>>> But this does not:
> >>>>>>
> >>>>>> commit 9a89c3434597104dd50553e3562983d78d14a544
> >>>>>> Author: cjfields <cjfields at eb9725d8-4842-0410-9bbb-c0b52e2da49b>
> >>>>>> Date: Thu Apr 15 04:21:17 2010 +0000
> >>>>>>
> >>>>>> [bug 3031]
> >>>>>>
> >>>>>> patches for catching algorithm ref, courtesy Razi Khaja.
> >>>>>>
> >>>>>> That makes it easy to find the diffs:
> >>>>>>
> >>>>>> $git diff 5e278f5dbb9afc4dc0359cd3fdc8fb0d0f4cad74
> >>>>>> 9a89c3434597104dd50553e3562983d78d14a544 Bio/SearchIO/blast.pm
> >>>>>> diff --git a/Bio/SearchIO/blast.pm b/Bio/SearchIO/blast.pm
> >>>>>> index 378023a..6f7eeeb 100644
> >>>>>> --- a/Bio/SearchIO/blast.pm
> >>>>>> +++ b/Bio/SearchIO/blast.pm
> >>>>>> @@ -209,6 +209,7 @@ BEGIN {
> >>>>>>
> >>>>>> 'BlastOutput_program' => 'RESULT-algorithm_name',
> >>>>>> 'BlastOutput_version' =>
> >>>> 'RESULT-algorithm_version',
> >>>>>> + 'BlastOutput_algorithm-reference' =>
> >>>>> 'RESULT-algorithm_reference',
> >>>>>> 'BlastOutput_query-def' => 'RESULT-query_name',
> >>>>>> 'BlastOutput_query-len' => 'RESULT-query_length',
> >>>>>> 'BlastOutput_query-acc' => 'RESULT-query_accession',
> >>>>>> @@ -504,6 +505,26 @@ sub next_result {
> >>>>>> }
> >>>>>> );
> >>>>>> }
> >>>>>> + # parse the BLAST algorithm reference
> >>>>>> + elsif(/^Reference:\s+(.*)$/) {
> >>>>>> + # want to preserve newlines for the BLAST algorithm
> >>>>> reference
> >>>>>> + my $algorithm_reference = "$1\n";
> >>>>>> + $_ = $self->_readline;
> >>>>>> + # while the current line, does not match an empty line,
> a
> >>>>> RID:,
> >>>>>> or a Database:, we are still looking at the
> >>>>>> + # algorithm_reference, append it to what we parsed so
> far
> >>>>>> + while($_ !~ /^$/ && $_ !~ /^RID:/ && $_ !~
> /^Database:/)
> >> {
> >>>>>> + $algorithm_reference .= "$_";
> >>>>>> + $_ = $self->_readline;
> >>>>>> + }
> >>>>>> + # if we exited the while loop, we saw an empty line, a
> >>>> RID:,
> >>>>> or
> >>>>>> a Database:, so push it back
> >>>>>> + $self->_pushback($_);
> >>>>>> + $self->element(
> >>>>>> + {
> >>>>>> + 'Name' => 'BlastOutput_algorithm-reference',
> >>>>>> + 'Data' => $algorithm_reference
> >>>>>> + }
> >>>>>> + );
> >>>>>> + }
> >>>>>> # added Windows workaround for bug 1985
> >>>>>> elsif (/^(Searching|Results from round)/) {
> >>>>>> next unless $1 =~ /Results from round/;
> >>>>>>
> >>>>>>
> >>>>>> I am not sure why reference parsing messes things up. Maybe it eats
> >> too
> >>>>> many
> >>>>>> lines from the result file.
> >>>>>>
> >>>>>> Yours,
> >>>>>>
> >>>>>> -Heikki
> >>>>>>
> >>>>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> >>>>>> cell: +966 545 595 849 office: +966 2 808 2429
> >>>>>>
> >>>>>> Computational Bioscience Research Centre (CBRC), Building #2, Office
> >>>>> #4216
> >>>>>> 4700 King Abdullah University of Science and Technology (KAUST)
> >>>>>> Thuwal 23955-6900, Kingdom of Saudi Arabia
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>> <mpiblast.out><blastparser028.pl
> >>> <blast.pm.diff>_______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list