[Bioperl-l] Question about embl format

Lincoln Stein lstein at cshl.org
Thu Apr 17 18:58:46 EDT 2003


Well, I've removed the check for <15 letters and will  wait for someone to 
complain.    The truncation code is ready to go in very quickly should 
someone complain about this.

Lincoln


On Thursday 17 April 2003 12:12 pm, Heikki Lehvaslaiho wrote:
> Lincoln,
>
> The feature table documentation states that:
>
> "Component names may be no more than 20 characters long  (Feature keys
> 15, Feature qualifiers 20)  and must contain at least one letter."
>
> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#n
>aming_conventions
>
> It also says that only certain keys are accepted. The parser used by the
> EBI EMBL database group ignores all unknown keys. Since you are using
> your own keys, you are free to do whatever you want. Incidentally, it
> looks like that no-one is using asterisk to start private key names.
>
> In my opinion, all sane parsers should read all the valid name
> characters [a-zA-Z0-9*'_-] to build the key. Bioperl seems to do the
> right thing.
>
> I do not know what is best. Try with long keys, and wait and see if
> someone complains?
>
> 	-Heikki
>
> On Thu, 2003-04-17 at 14:45, Lincoln Stein wrote:
> > Hello,
> >
> > The "sequence dumper" plugin for the Generic Genome Browser has been
> > crashing when making an EMBL dump of a particular region of the worm
> > genome.  The issue is a "Transposon_insertion" feature, which exceeds the
> > 15 character limit for EMBL feature tags.  If I remove the
> > Bio::SeqIO::embl check for this limit, I get an output that looks like
> > this:
> >
> > ...
> > FT   Transposon_insertion complement(13204595..13204596)
> > FT                   /score=""
> > FT                   /group="cxP4108"
> > FT                   /id=7726466
> > FT                   /method="Transposon_insertion"
> > FT                   /source="Allele"
> > FT                   /phase=""
> > FT   repeat          13204572..13204602
> > FT                   /score=80
> > FT                   /group=""
> > FT                   /notes="loop 283"
> > FT                   /id=7775180
> > FT                   /method="repeat"
> > FT                   /source="inverted"
> > FT                   /phase=""
> > FT                   /note="score=80"
> > ...
> >
> > My question is whether this is acceptable embl format?  If not, I will
> > have to truncate feature type names at 15 characters, but this is going
> > to lose information.
> >
> > Lincoln

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein at cshl.org			                  Cold Spring Harbor, NY
========================================================================




More information about the Bioperl-l mailing list