[Bioperl-l] Bug in SeqIO/swiss.pm
Jason Stajich
jason.stajich at duke.edu
Wed Jan 5 20:14:51 EST 2005
Thanks for the report. I believe I fixed this in my Nov 22 commit-
revision 1.84- of Bio/SeqIO/swiss.pm so it will be in bioperl 1.5 or it
is currently available from the code in CVS.
-jason
On Jan 5, 2005, at 4:08 PM, rfsouza at cecm.usp.br wrote:
> Hi,
>
> I have found what might be a bug in the SeqIO parser for Swissprot
> flat files
> (swiss.pm). The error message printed is
>
> Invalid [] range "a-S" before HERE mark in regex m/^Cotton leaf curl
> Gezira virus - [Okra-S << HERE hambat]$/ at
> /home/users/rfsouza/projects/geral/lib/perl5/site_perl/5.8.1/Bio/
> SeqIO/swiss.pm
> line 985, <GEN0> line 10.
>
> and the Swissprot entry is pasted below. The problem is a match
> operator
> at line 985:
>
> 984 #if the organism belongs to taxid 32644 then no Bio::Species
> object.
> 985 return if grep { /^$binomial$/ } @Unknown_names;
>
> I managed to fix this and have swiss.pm to parse the entire Uniprot
> release
> 2.1 by adding this line
>
> $binomial =~ s/(\[|\])/\\$1/g;
>
> just before line 985. Would anybody like to add this fix to the CVS
> version of swiss.pm? Since this is the only entry which swiss.pm was
> not
> able to
> parse, out of 1520915 entries in Uniprot, I was considering if it is
> not an
> annotation error in Uniprot, violating their own standard...
>
> Greeting and happy new year :).
> Robson
>
> #==============
>
> ID Q8UYF6 STANDARD; PRT; 258 AA.
> AC Q8UYF6;
> DT 01-MAR-2002 (TrEMBLrel. 20, Created)
> DT 01-MAR-2002 (TrEMBLrel. 20, Last sequence update)
> DT 01-MAR-2004 (TrEMBLrel. 26, Last annotation update)
> DE Coat protein.
> OS Cotton leaf curl Gezira virus - [Okra-Shambat].
> OC Viruses; ssDNA viruses; Geminiviridae; Begomovirus.
> OX NCBI_TaxID=268964;
> RN [1]
> RP SEQUENCE FROM N.A.
> RA Idris A.M., Brown J.K.;
> RT "Molecular analysis of cotton leaf curl virus-Sudan reveals an
> RT evolutionary history of recombination.";
> RL Virus Genes 0:0-0(2002).
> DR EMBL; AY036008; AAK64541.1; -.
> DR GO; GO:0019028; C:viral capsid; IEA.
> DR GO; GO:0005198; F:structural molecule activity; IEA.
> DR InterPro; IPR000650; Gem_coat_AR1.
> DR InterPro; IPR000263; GV_A/BR1_coat.
> DR Pfam; PF00844; Gemini_coat; 1.
> DR PRINTS; PR00224; GEMCOATAR1.
> DR PRINTS; PR00223; GEMCOATARBR1.
> DR ProDom; PD000901; Gem_coat_AR1; 1.
> KW Coat protein.
> SQ SEQUENCE 258 AA; 29778 MW; 6FB1960A9D8763DD CRC64;
> MSKRPADIII STPASKVRRR LNFDSPGLSS ARAPTVLVTN KRRSWTNRPT YRKPRMYRMY
> RSPDVPKGCE GPCKVQSYEQ RDDIKHTGIV RCVSDVTKGV GITHRTGKRF TIKSIYILGK
> VWMDDNIKKQ NHTNNVMFFL VRDRRPYGNS PLDFGQVFNM FDNEPSTATV KNDLRDHFQV
> LRKFTATVIG GPSGMKEQAL VRRFYRINSQ IVYNHQEAGK FENHTENAIL LYMACTHASN
> PVYATLKIRI YFYDSVSN
> //
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
More information about the Bioperl-l
mailing list