[Bioperl-l] A problem parsing BLASTX 2.2.8 reports

Jason Stajich jason at cgt.duhs.duke.edu
Tue Mar 30 14:03:15 EST 2004


That particular problem has been reported to NCBI and is listed in bug
#1598 as I posted to the list last week.
http://bugzilla.open-bio.org/show_bug.cgi?id=1598

I committed a patch to Bio::SearchIO::blast in CVS to handle this bug in
BLAST (but of course cannot get the # letters in the database correct
since this is mangled).

-jason
On Tue, 30 Mar 2004, Gilles Parmentier wrote:

> Hi all,
>
> Using mac os X I noticed a lot of problems with 2.2.8. reports. In fact
> they look really buggy (strange query lengths among other things).
> Bioperl::SearIO is not able to parse them. To tackle this I downgraded
> my install to 2.2.6 :(
>
> Gilles
>
> Jason Stajich wrote:
>
> >Matt - I'm having a little trouble understanding the problem - why aren't
> >the value you are reporting what you expect.
> >
> >With BLASTX the query will have a frame (0,1,2) in GFF/bioperl not
> >(1,2,3). So in your example the query will have some frame [1] and some
> >strand [1] since frame is '+'.  The hit will have no strand since it is
> >protein [0].  Isn't that what you got?
> >
> >$hit->frame is going to return something different depending on what type
> >of search you did also.  For TBLASTN or BLASTX it will return the valid
> >frame for whatever makes sense (hit or query) and will return an array for
> >TBLASTX.
> >
> >Also, the Hit object will try and make a summary value for all the HSPs -
> >this will be the frame for all the HSPs (if they share the same frame
> >throughout) or just the frame of the first HSP if they differ.
> >
> >I don't really like to rely on this personally.  Rather I would call it
> >explicitly for the HSP:
> >
> >$hsp->query->frame or $hsp->hit->frame
> > or
> >$hsp->frame('query'), $hsp->frame('hit')
> > or
> >my ($qframe,$hframe) = $hsp->frame;
> >
> >All in all it is hard to say without a copy of the report(s) and your
> >code.
> >
> >-jason
> >
> >On Tue, 30 Mar 2004, Matthew Links wrote:
> >
> >
> >
> >>I have run into a problem parsing BLASTX reports (version 2.2.8). When I
> >>ask for strand and frame on the Bio::Search::Hit::HitI object I am
> >>getting back the wrong answer.
> >>
> >>I think this has to do with a slight formatting change in the BLAST
> >>output.
> >>
> >>--- BLASTX 2.2.5 ---
> >>
> >>
> >>
> >>>gi|19879878|gb|AAM00191.1| guanine nucleotide-exchange protein GEP2
> >>>
> >>>
> >>            [Oryza sativa]
> >>          Length = 1789
> >>
> >> Score =  320 bits (819), Expect(2) = 7e-98
> >> Identities = 160/193 (82%), Positives = 173/193 (89%)
> >> Frame = +2
> >>
> >>--- BLASTX 2.2.8 ---
> >>
> >>
> >>
> >>>gi|38346787|emb|CAE02205.2| OSJNBa0095H06.12 [Oryza sativa (japonica
> >>>
> >>>
> >>            cultivar-group)]
> >>          Length = 1724
> >>
> >> Score =  102 bits (254), Expect = 5e-21
> >> Identities = 62/204 (30%), Positives = 103/204 (50%), Gaps = 30/204
> >>(14%)
> >> Frame = +2
> >>
> >>In my debugging it looks like everything is ok except for the
> >>strand/frame data. Which when parsing 2.2.8 gets
> >>
> >>hit->frame = 1
> >>hit->strand('query') = 1
> >>hit->strand('hit') = 0
> >>
> >>Has anyone seen this problem before?
> >>
> >>Thanks in advance,
> >>
> >>Matt
> >>
> >>
> >>
> >>
> >
> >--
> >Jason Stajich
> >Duke University
> >jason at cgt.mc.duke.edu
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list