[Bioperl-l] Parsing a netblast file

Wes Barris wes.barris at csiro.au
Thu Jul 31 21:31:06 EDT 2003


Jason Stajich wrote:

>>Through trial and error I have narrowed down the problem to the negative
>>sign in the database details.  Here is the section in question from a
>>netblast result file:
>>
>>Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,
>>or phase 0, 1 or 2 HTGS sequences)
>>            1,819,241 sequences; -24,217,474 total letters
> 
> 
> integer overflow.  The number of letters in nt is > than the
> largest signed number (2147483647) that an integer can represent.
> 
> Looks like nt length is 8,782,847,770 - seems like it has been larger than
> INT_MAX for a while, surprised they haven't updated their code.  Do you
> have the latest version of netblast on your machine?  A bug report to NCBI
> is probably a good idea if you are running the latest version

Hi Jason,

Thanks for responding.  Yes, I am running the latest blastcl3 from the NCBI
ftp site.  I had already alerted NCBI to the problem (although I didn't
understand the source of the problem until you pointed it out).  Here is their
response.  It doesn't look like they are interested in fixing it:

--------------------------
We have some back compatibility issue for the older client and would not be
able to change this.

The best way is to address it to bioperl and have it changed to be more
tolerant.  As I mentioned before, the correct db info is given at the end.

Regards,

Tao Tao
NCBI USER Service
----------------------------

[...snip...]

> We'd just need to tweak the regexp a little bit to handle a leading -.
> What version of bioperl are you running so can provide a patch which is
> appropriate for your version?

I am running bioperl-1.2.2

-- 
Wes Barris
E-Mail: Wes.Barris at csiro.au



More information about the Bioperl-l mailing list