[Bioperl-l] Parsing a netblast file
Wes Barris
wes.barris at csiro.au
Thu Jul 31 22:36:23 EDT 2003
Jason Stajich wrote:
> Here is the patch it is pretty simple or you just need to grab the latest
> version of blast.pm from CVS.
Thank you! It works like a charm.
>
> Index: Bio/SearchIO/blast.pm
> ===================================================================
> RCS file: /home/repository/bioperl/bioperl-live/Bio/SearchIO/blast.pm,v
> retrieving revision 1.42.2.9
> diff -r1.42.2.9 blast.pm
> 273c273
> < if( /\(([\d,]+)\s+letters.*\)/ ) {
> ---
>
>> if( /\((\-?[\d,]+)\s+letters.*\)/ ) {
>
> 325c325
> < if(
> /^\s+([\d\,]+)\s+sequences\;\s+([\d,]+)\s+total\s+letters/){
> ---
>
>> if(
>
> /^\s+(\-?[\d\,]+)\s+sequences\;\s+(\-?[\d,]+)\s+total\s+letters/){
> 525c525
> < } elsif ( /letters in database:\s+([\d,]+)/i) {
> ---
>
>> } elsif ( /letters in database:\s+(\-?[\d,]+)/i) {
>
>
>
> On Fri, 1 Aug 2003, Wes Barris wrote:
>
>
>>Jason Stajich wrote:
>>
>>
>>>>Through trial and error I have narrowed down the problem to the negative
>>>>sign in the database details. Here is the section in question from a
>>>>netblast result file:
>>>>
>>>>Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,
>>>>or phase 0, 1 or 2 HTGS sequences)
>>>> 1,819,241 sequences; -24,217,474 total letters
>>>
>>>
>>>integer overflow. The number of letters in nt is > than the
>>>largest signed number (2147483647) that an integer can represent.
>>>
>>>Looks like nt length is 8,782,847,770 - seems like it has been larger than
>>>INT_MAX for a while, surprised they haven't updated their code. Do you
>>>have the latest version of netblast on your machine? A bug report to NCBI
>>>is probably a good idea if you are running the latest version
>>
>>Hi Jason,
>>
>>Thanks for responding. Yes, I am running the latest blastcl3 from the NCBI
>>ftp site. I had already alerted NCBI to the problem (although I didn't
>>understand the source of the problem until you pointed it out). Here is their
>>response. It doesn't look like they are interested in fixing it:
>>
>>--------------------------
>>We have some back compatibility issue for the older client and would not be
>>able to change this.
>>
>>The best way is to address it to bioperl and have it changed to be more
>>tolerant. As I mentioned before, the correct db info is given at the end.
>>
>>Regards,
>>
>>Tao Tao
>>NCBI USER Service
>>----------------------------
>>
>>[...snip...]
>>
>>
>>>We'd just need to tweak the regexp a little bit to handle a leading -.
>>>What version of bioperl are you running so can provide a patch which is
>>>appropriate for your version?
>>
>>I am running bioperl-1.2.2
>>
>>
>
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
--
Wes Barris
E-Mail: Wes.Barris at csiro.au
More information about the Bioperl-l
mailing list