[Bioperl-l] writing genbank files
gert thijs
gert.thijs@esat.kuleuven.ac.be
Thu, 19 Sep 2002 10:53:39 +0200
Jason,
I have 1.0.2 installed and this works fine apart from the problem when writing
splitLocations.
To solve this problem, I downloaded the latest version from the main trunk as
Hilmar suggested.
When testing this new version, I encountered the error with the species name.
The problem with this genbank entry (and I have much more similar entries) is
the validation of the species name. 'eurosids II' does not match the regex
/^[A-Z][\sa-z]+$/ and that's why the parser bails out. Other entries do not
seem to have that problem.
Gert
Jason Stajich wrote:
> I did this fix about 1 week ago, don't see how your parsing would have
> worked before unless genbank parsing changed too from the version you
> were using before..
>
> All I did was:
>
> RCS file: /home/repository/bioperl/bioperl-live/Bio/Species.pm,v
> retrieving revision 1.16
> retrieving revision 1.17
> diff -r1.16 -r1.17
> 282c282
> < return 1 if $string =~ /^[A-Z][a-z]+$/;
> ---
>
>> return 1 if $string =~ /^[A-Z][\sa-z]+$/;
>
>
>
> Someone needs to refresh the ideas behind the Species object and the
> taxonomic fields in genbank/embl records. Either we are parsing things
> differently or the values that one can put in the field are changing. We
> have a lot more taxonomic fields that are not matching what was expected
> when this module was built (James G did the brunt of the work back in the
> day).
>
> Anyways, I am perfectly happy to turn off the name validatation
> altogether, basically it required all fields other than the species to
> start with a capital letter.
>
> -jason
>
>
> On Wed, 18 Sep 2002, Hilmar Lapp wrote:
>
>
>>I believe Jason fixed something yesterday in Species.pm in order to
>>allow spaces in certain places. Jason?
>>
>> -hilmar
>>
>>On Wednesday, September 18, 2002, at 01:48 AM, gert thijs wrote:
>>
>>
>>>Hilmar,
>>>
>>>I just installed the modules from the main trunk. I tried to test
>>>it but now I was unable to parse input sequences in genbank format.
>>>Now I have a problem uploading a genbank flat file. There seems to
>>>be a problem while parsing the species name. I guess not having an
>>>upper case starting letter stops the genbank parser. In attachment
>>>you can find a file on which the parser throws the expection.
>>>
>>>------------- EXCEPTION: Bio::Root::Exception -------------
>>>MSG: Invalid name 'eurosids II' (Wrong case?)
>>>STACK: Error::throw
>>>STACK: Bio::Root::Root::throw
>>>/users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/Root/Root.pm:318
>>>STACK: Bio::Species::validate_name
>>>/users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/Species.pm:283
>>>STACK: Bio::Species::classification
>>>/users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/Species.pm:121
>>>STACK: Bio::SeqIO::genbank::_read_GenBank_Species
>>>/users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/SeqIO/genbank.pm:884
>>>STACK: Bio::SeqIO::genbank::next_seq
>>>/users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/SeqIO/genbank.pm:229
>>>STACK: AnnotatedSequence::new
>>>/users/sista/thijs/perl/lib//AnnotatedSequence.pm:66
>>>STACK: GeneIndex.pl:168
>>>-----------------------------------------------------------
>>>
>>>Gert
>>>
>>>
>>>Hilmar Lapp wrote:
>>>
>>>>It should be written as join(complement(...),complement(...),...).
>>>>This is main trunk only though. Do you have an example where this
>>>>is not true?
>>>> -hilmar
>>>>On Tuesday, September 17, 2002, at 02:06 AM, gert thijs wrote:
>>>>
>>>>>Hello,
>>>>>
>>>>>I have a question about the current status of the genbank file
>>>>>parser/writer. I noticed that a CDS with a location of the type
>>>>>complement(join()) is written as a join() without the complement.
>>>>>I saw that this problem has been a major thread on the list a few
>>>>>weeks ago, but I could not find if the problem has been solved by
>>>>>now or if it was solved how it should be solved.
>>>>>
>>>>>Gert
>>>>>
>>>>>
>>>>>
>>>>>-- + Gert Thijs
>>>>>+ K.U.Leuven
>>>>>+ ESAT-SCD
>>>>>+ Kasteelpark Arenberg 10
>>>>>+ B-3001 Leuven-Heverlee
>>>>>+ Belgium
>>>>>+
>>>>>+ Tel : +32 16 32 85 88
>>>>>+ Fax : +32 16 32 19 70
>>>>>+ email: gert.thijs@esat.kuleuven.ac.be
>>>>>+
>>>>>+ http://www.esat.kuleuven.ac.be/~thijs
>>>>>+ http://www.esat.kuleuven.ac.be/~dna/BioI/
>>>>>+
>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l@bioperl.org
>>>>>http://bioperl.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>>-- -------------------------------------------------------------
>>>>Hilmar Lapp email: lapp at gnf.org
>>>>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
>>>>-------------------------------------------------------------
>>>>_______________________________________________
>>>>Bioperl-l mailing list
>>>>Bioperl-l@bioperl.org
>>>>http://bioperl.org/mailman/listinfo/bioperl-l
>>>
>>>
--
+ Gert Thijs
+ K.U.Leuven
+ ESAT-SCD
+ Kasteelpark Arenberg 10
+ B-3001 Leuven-Heverlee
+ Belgium
+
+ Tel : +32 16 32 85 88
+ Fax : +32 16 32 19 70
+ email: gert.thijs@esat.kuleuven.ac.be
+
+ http://www.esat.kuleuven.ac.be/~thijs
+ http://www.esat.kuleuven.ac.be/~dna/BioI/
+