[Bioperl-l] Question about embl format
Heikki Lehvaslaiho
heikki at ebi.ac.uk
Thu Apr 17 18:12:03 EDT 2003
Lincoln,
The feature table documentation states that:
"Component names may be no more than 20 characters long (Feature keys
15, Feature qualifiers 20) and must contain at least one letter."
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#naming_conventions
It also says that only certain keys are accepted. The parser used by the
EBI EMBL database group ignores all unknown keys. Since you are using
your own keys, you are free to do whatever you want. Incidentally, it
looks like that no-one is using asterisk to start private key names.
In my opinion, all sane parsers should read all the valid name
characters [a-zA-Z0-9*'_-] to build the key. Bioperl seems to do the
right thing.
I do not know what is best. Try with long keys, and wait and see if
someone complains?
-Heikki
On Thu, 2003-04-17 at 14:45, Lincoln Stein wrote:
> Hello,
>
> The "sequence dumper" plugin for the Generic Genome Browser has been crashing
> when making an EMBL dump of a particular region of the worm genome. The
> issue is a "Transposon_insertion" feature, which exceeds the 15 character
> limit for EMBL feature tags. If I remove the Bio::SeqIO::embl check for this
> limit, I get an output that looks like this:
>
> ...
> FT Transposon_insertion complement(13204595..13204596)
> FT /score=""
> FT /group="cxP4108"
> FT /id=7726466
> FT /method="Transposon_insertion"
> FT /source="Allele"
> FT /phase=""
> FT repeat 13204572..13204602
> FT /score=80
> FT /group=""
> FT /notes="loop 283"
> FT /id=7775180
> FT /method="repeat"
> FT /source="inverted"
> FT /phase=""
> FT /note="score=80"
> ...
>
> My question is whether this is acceptable embl format? If not, I will have to
> truncate feature type names at 15 characters, but this is going to lose
> information.
>
> Lincoln
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki at ebi.ac.uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list