[Bioperl-l] Parsing a netblast file

Jason Stajich jason at cgt.duhs.duke.edu
Thu Jul 31 22:10:33 EDT 2003


Here is the patch it is pretty simple or you just need to grab the latest
version of blast.pm from CVS.

Index: Bio/SearchIO/blast.pm
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/Bio/SearchIO/blast.pm,v
retrieving revision 1.42.2.9
diff -r1.42.2.9 blast.pm
273c273
<                if( /\(([\d,]+)\s+letters.*\)/ ) {
---
>                if( /\((\-?[\d,]+)\s+letters.*\)/ ) {
325c325
< 	       if(
/^\s+([\d\,]+)\s+sequences\;\s+([\d,]+)\s+total\s+letters/){
---
> 	       if(
/^\s+(\-?[\d\,]+)\s+sequences\;\s+(\-?[\d,]+)\s+total\s+letters/){
525c525
< 	       } elsif ( /letters in database:\s+([\d,]+)/i) {
---
> 	       } elsif ( /letters in database:\s+(\-?[\d,]+)/i) {


On Fri, 1 Aug 2003, Wes Barris wrote:

> Jason Stajich wrote:
>
> >>Through trial and error I have narrowed down the problem to the negative
> >>sign in the database details.  Here is the section in question from a
> >>netblast result file:
> >>
> >>Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,
> >>or phase 0, 1 or 2 HTGS sequences)
> >>            1,819,241 sequences; -24,217,474 total letters
> >
> >
> > integer overflow.  The number of letters in nt is > than the
> > largest signed number (2147483647) that an integer can represent.
> >
> > Looks like nt length is 8,782,847,770 - seems like it has been larger than
> > INT_MAX for a while, surprised they haven't updated their code.  Do you
> > have the latest version of netblast on your machine?  A bug report to NCBI
> > is probably a good idea if you are running the latest version
>
> Hi Jason,
>
> Thanks for responding.  Yes, I am running the latest blastcl3 from the NCBI
> ftp site.  I had already alerted NCBI to the problem (although I didn't
> understand the source of the problem until you pointed it out).  Here is their
> response.  It doesn't look like they are interested in fixing it:
>
> --------------------------
> We have some back compatibility issue for the older client and would not be
> able to change this.
>
> The best way is to address it to bioperl and have it changed to be more
> tolerant.  As I mentioned before, the correct db info is given at the end.
>
> Regards,
>
> Tao Tao
> NCBI USER Service
> ----------------------------
>
> [...snip...]
>
> > We'd just need to tweak the regexp a little bit to handle a leading -.
> > What version of bioperl are you running so can provide a patch which is
> > appropriate for your version?
>
> I am running bioperl-1.2.2
>
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list