[Bioperl-l] Parsing a netblast file
Jason Stajich
jason at cgt.duhs.duke.edu
Thu Jul 31 22:10:33 EDT 2003
Here is the patch it is pretty simple or you just need to grab the latest
version of blast.pm from CVS.
Index: Bio/SearchIO/blast.pm
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/Bio/SearchIO/blast.pm,v
retrieving revision 1.42.2.9
diff -r1.42.2.9 blast.pm
273c273
< if( /\(([\d,]+)\s+letters.*\)/ ) {
---
> if( /\((\-?[\d,]+)\s+letters.*\)/ ) {
325c325
< if(
/^\s+([\d\,]+)\s+sequences\;\s+([\d,]+)\s+total\s+letters/){
---
> if(
/^\s+(\-?[\d\,]+)\s+sequences\;\s+(\-?[\d,]+)\s+total\s+letters/){
525c525
< } elsif ( /letters in database:\s+([\d,]+)/i) {
---
> } elsif ( /letters in database:\s+(\-?[\d,]+)/i) {
On Fri, 1 Aug 2003, Wes Barris wrote:
> Jason Stajich wrote:
>
> >>Through trial and error I have narrowed down the problem to the negative
> >>sign in the database details. Here is the section in question from a
> >>netblast result file:
> >>
> >>Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,
> >>or phase 0, 1 or 2 HTGS sequences)
> >> 1,819,241 sequences; -24,217,474 total letters
> >
> >
> > integer overflow. The number of letters in nt is > than the
> > largest signed number (2147483647) that an integer can represent.
> >
> > Looks like nt length is 8,782,847,770 - seems like it has been larger than
> > INT_MAX for a while, surprised they haven't updated their code. Do you
> > have the latest version of netblast on your machine? A bug report to NCBI
> > is probably a good idea if you are running the latest version
>
> Hi Jason,
>
> Thanks for responding. Yes, I am running the latest blastcl3 from the NCBI
> ftp site. I had already alerted NCBI to the problem (although I didn't
> understand the source of the problem until you pointed it out). Here is their
> response. It doesn't look like they are interested in fixing it:
>
> --------------------------
> We have some back compatibility issue for the older client and would not be
> able to change this.
>
> The best way is to address it to bioperl and have it changed to be more
> tolerant. As I mentioned before, the correct db info is given at the end.
>
> Regards,
>
> Tao Tao
> NCBI USER Service
> ----------------------------
>
> [...snip...]
>
> > We'd just need to tweak the regexp a little bit to handle a leading -.
> > What version of bioperl are you running so can provide a patch which is
> > appropriate for your version?
>
> I am running bioperl-1.2.2
>
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the Bioperl-l
mailing list