[Bioperl-l] Question about embl format
Lincoln Stein
lstein at cshl.org
Thu Apr 17 18:58:46 EDT 2003
Well, I've removed the check for <15 letters and will wait for someone to
complain. The truncation code is ready to go in very quickly should
someone complain about this.
Lincoln
On Thursday 17 April 2003 12:12 pm, Heikki Lehvaslaiho wrote:
> Lincoln,
>
> The feature table documentation states that:
>
> "Component names may be no more than 20 characters long (Feature keys
> 15, Feature qualifiers 20) and must contain at least one letter."
>
> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#n
>aming_conventions
>
> It also says that only certain keys are accepted. The parser used by the
> EBI EMBL database group ignores all unknown keys. Since you are using
> your own keys, you are free to do whatever you want. Incidentally, it
> looks like that no-one is using asterisk to start private key names.
>
> In my opinion, all sane parsers should read all the valid name
> characters [a-zA-Z0-9*'_-] to build the key. Bioperl seems to do the
> right thing.
>
> I do not know what is best. Try with long keys, and wait and see if
> someone complains?
>
> -Heikki
>
> On Thu, 2003-04-17 at 14:45, Lincoln Stein wrote:
> > Hello,
> >
> > The "sequence dumper" plugin for the Generic Genome Browser has been
> > crashing when making an EMBL dump of a particular region of the worm
> > genome. The issue is a "Transposon_insertion" feature, which exceeds the
> > 15 character limit for EMBL feature tags. If I remove the
> > Bio::SeqIO::embl check for this limit, I get an output that looks like
> > this:
> >
> > ...
> > FT Transposon_insertion complement(13204595..13204596)
> > FT /score=""
> > FT /group="cxP4108"
> > FT /id=7726466
> > FT /method="Transposon_insertion"
> > FT /source="Allele"
> > FT /phase=""
> > FT repeat 13204572..13204602
> > FT /score=80
> > FT /group=""
> > FT /notes="loop 283"
> > FT /id=7775180
> > FT /method="repeat"
> > FT /source="inverted"
> > FT /phase=""
> > FT /note="score=80"
> > ...
> >
> > My question is whether this is acceptable embl format? If not, I will
> > have to truncate feature type names at 15 characters, but this is going
> > to lose information.
> >
> > Lincoln
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein at cshl.org Cold Spring Harbor, NY
========================================================================
More information about the Bioperl-l
mailing list