[Bioperl-l] writing genbank files

Elia Stupka elia@fugu-sg.org
Thu, 19 Sep 2002 17:29:52 +0800 (SGT)


> The problem with this genbank entry (and I have much more similar entries) is 
> the validation of the species name. 'eurosids II' does not match the regex 
> /^[A-Z][\sa-z]+$/ and that's why the parser bails out. Other entries do not 
> seem to have that problem.

Exactly the same problem we have with swissprot.

Elia

> 
> Gert
> 
> 
> Jason Stajich wrote:
> > I did this fix about 1 week ago, don't see how your parsing would have
> > worked before unless genbank parsing changed too from the version you
> > were using before..
> > 
> > All I did was:
> > 
> > RCS file: /home/repository/bioperl/bioperl-live/Bio/Species.pm,v
> > retrieving revision 1.16
> > retrieving revision 1.17
> > diff -r1.16 -r1.17
> > 282c282
> > <     return 1 if $string =~ /^[A-Z][a-z]+$/;
> > ---
> > 
> >>    return 1 if $string =~ /^[A-Z][\sa-z]+$/;
> > 
> > 
> > 
> > Someone needs to refresh the ideas behind the Species object and the
> > taxonomic fields in genbank/embl records.  Either we are parsing things
> > differently or the values that one can put in the field are changing.  We
> > have a lot more taxonomic fields that are not matching what was expected
> > when this module was built (James G did the brunt of the work back in the
> > day).
> > 
> > Anyways, I am perfectly happy to turn off the name validatation
> > altogether, basically it required all fields other than the species to
> > start with a capital letter.
> > 
> > -jason
> > 
> > 
> > On Wed, 18 Sep 2002, Hilmar Lapp wrote:
> > 
> > 
> >>I believe Jason fixed something yesterday in Species.pm in order to
> >>allow spaces in certain places. Jason?
> >>
> >>	-hilmar
> >>
> >>On Wednesday, September 18, 2002, at 01:48 AM, gert thijs wrote:
> >>
> >>
> >>>Hilmar,
> >>>
> >>>I just installed the modules from the main trunk. I tried to test
> >>>it but now I was unable to parse input sequences in genbank format.
> >>>Now I have a problem uploading a genbank flat file. There seems to
> >>>be a problem while parsing the species name. I guess not having an
> >>>upper case starting letter stops the genbank parser. In attachment
> >>>you can find a file on which the parser throws the expection.
> >>>
> >>>------------- EXCEPTION: Bio::Root::Exception -------------
> >>>MSG: Invalid name 'eurosids II' (Wrong case?)
> >>>STACK: Error::throw
> >>>STACK: Bio::Root::Root::throw
> >>>/users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/Root/Root.pm:318
> >>>STACK: Bio::Species::validate_name
> >>>/users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/Species.pm:283
> >>>STACK: Bio::Species::classification
> >>>/users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/Species.pm:121
> >>>STACK: Bio::SeqIO::genbank::_read_GenBank_Species
> >>>/users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/SeqIO/genbank.pm:884
> >>>STACK: Bio::SeqIO::genbank::next_seq
> >>>/users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/SeqIO/genbank.pm:229
> >>>STACK: AnnotatedSequence::new
> >>>/users/sista/thijs/perl/lib//AnnotatedSequence.pm:66
> >>>STACK: GeneIndex.pl:168
> >>>-----------------------------------------------------------
> >>>
> >>>Gert
> >>>
> >>>
> >>>Hilmar Lapp wrote:
> >>>
> >>>>It should be written as join(complement(...),complement(...),...).
> >>>>This is main trunk only though. Do you have an example where this
> >>>>is not true?
> >>>>    -hilmar
> >>>>On Tuesday, September 17, 2002, at 02:06 AM, gert thijs wrote:
> >>>>
> >>>>>Hello,
> >>>>>
> >>>>>I have a question about the current status of the genbank file
> >>>>>parser/writer.  I noticed that a CDS with a location of the type
> >>>>>complement(join()) is written as a join() without the complement.
> >>>>>I saw that this problem has been a major thread on the list a few
> >>>>>weeks ago, but I could not find if the problem has been solved by
> >>>>>now or if it was solved how it should be solved.
> >>>>>
> >>>>>Gert
> >>>>>
> >>>>>
> >>>>>
> >>>>>-- + Gert Thijs
> >>>>>+  K.U.Leuven
> >>>>>+  ESAT-SCD
> >>>>>+  Kasteelpark Arenberg 10
> >>>>>+  B-3001 Leuven-Heverlee
> >>>>>+  Belgium
> >>>>>+
> >>>>>+ Tel  : +32 16 32 85 88
> >>>>>+ Fax  : +32 16 32 19 70
> >>>>>+ email: gert.thijs@esat.kuleuven.ac.be
> >>>>>+
> >>>>>+  http://www.esat.kuleuven.ac.be/~thijs
> >>>>>+  http://www.esat.kuleuven.ac.be/~dna/BioI/
> >>>>>+
> >>>>>
> >>>>>_______________________________________________
> >>>>>Bioperl-l mailing list
> >>>>>Bioperl-l@bioperl.org
> >>>>>http://bioperl.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>
> >>>>-- -------------------------------------------------------------
> >>>>Hilmar Lapp                            email: lapp at gnf.org
> >>>>GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> >>>>-------------------------------------------------------------
> >>>>_______________________________________________
> >>>>Bioperl-l mailing list
> >>>>Bioperl-l@bioperl.org
> >>>>http://bioperl.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> 
> 

Elia

********************************
* http://www.fugu-sg.org/~elia *
* tel:    +65 6874 1467        *
* mobile: +65 9030 7613        *
* fax:    +65 6779 1117        *
********************************