[Bioperl-l] Bug in SeqIO/swiss.pm

Jason Stajich jason.stajich at duke.edu
Wed Jan 5 20:14:51 EST 2005


Thanks for the report.  I believe I fixed this in my Nov 22 commit-  
revision 1.84- of Bio/SeqIO/swiss.pm so it will be in bioperl 1.5 or it  
is currently available from the code in CVS.

-jason
On Jan 5, 2005, at 4:08 PM, rfsouza at cecm.usp.br wrote:

> Hi,
>
> I have found what might be a bug in the SeqIO parser for Swissprot  
> flat files
> (swiss.pm). The error message printed is
>
> Invalid [] range "a-S" before HERE mark in regex m/^Cotton leaf curl
> Gezira virus - [Okra-S << HERE hambat]$/ at
> /home/users/rfsouza/projects/geral/lib/perl5/site_perl/5.8.1/Bio/ 
> SeqIO/swiss.pm
> line 985, <GEN0> line 10.
>
> and the Swissprot entry is pasted below. The problem is a match  
> operator
> at line 985:
>
> 984 #if the organism belongs to taxid 32644 then no Bio::Species  
> object.
> 985 return if grep { /^$binomial$/ } @Unknown_names;
>
> I managed to fix this and have swiss.pm to parse the entire Uniprot  
> release
> 2.1 by adding this line
>
> $binomial =~ s/(\[|\])/\\$1/g;
>
> just before line 985. Would anybody like to add this fix to the CVS
> version of swiss.pm? Since this is the only entry which swiss.pm was  
> not
> able to
> parse, out of 1520915 entries in Uniprot, I was considering if it is  
> not an
> annotation error in Uniprot, violating their own standard...
>
> Greeting and happy new year :).
> Robson
>
> #==============
>
> ID   Q8UYF6         STANDARD;      PRT;   258 AA.
> AC   Q8UYF6;
> DT   01-MAR-2002 (TrEMBLrel. 20, Created)
> DT   01-MAR-2002 (TrEMBLrel. 20, Last sequence update)
> DT   01-MAR-2004 (TrEMBLrel. 26, Last annotation update)
> DE   Coat protein.
> OS   Cotton leaf curl Gezira virus - [Okra-Shambat].
> OC   Viruses; ssDNA viruses; Geminiviridae; Begomovirus.
> OX   NCBI_TaxID=268964;
> RN   [1]
> RP   SEQUENCE FROM N.A.
> RA   Idris A.M., Brown J.K.;
> RT   "Molecular analysis of cotton leaf curl virus-Sudan reveals an
> RT   evolutionary history of recombination.";
> RL   Virus Genes 0:0-0(2002).
> DR   EMBL; AY036008; AAK64541.1; -.
> DR   GO; GO:0019028; C:viral capsid; IEA.
> DR   GO; GO:0005198; F:structural molecule activity; IEA.
> DR   InterPro; IPR000650; Gem_coat_AR1.
> DR   InterPro; IPR000263; GV_A/BR1_coat.
> DR   Pfam; PF00844; Gemini_coat; 1.
> DR   PRINTS; PR00224; GEMCOATAR1.
> DR   PRINTS; PR00223; GEMCOATARBR1.
> DR   ProDom; PD000901; Gem_coat_AR1; 1.
> KW   Coat protein.
> SQ   SEQUENCE   258 AA;  29778 MW;  6FB1960A9D8763DD CRC64;
>      MSKRPADIII STPASKVRRR LNFDSPGLSS ARAPTVLVTN KRRSWTNRPT YRKPRMYRMY
>      RSPDVPKGCE GPCKVQSYEQ RDDIKHTGIV RCVSDVTKGV GITHRTGKRF TIKSIYILGK
>      VWMDDNIKKQ NHTNNVMFFL VRDRRPYGNS PLDFGQVFNM FDNEPSTATV KNDLRDHFQV
>      LRKFTATVIG GPSGMKEQAL VRRFYRINSQ IVYNHQEAGK FENHTENAIL LYMACTHASN
>      PVYATLKIRI YFYDSVSN
> //
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/



More information about the Bioperl-l mailing list