[Bioperl-l] Question about embl format

Lincoln Stein lstein at cshl.org
Thu Apr 17 18:56:13 EDT 2003


OK, so what to do about primary_tags that are >= 15 letters, since BioPerl 
doesn't enforce a size limit on primary_tags?  If I implement truncation at 
the write_seq level, then we'll lose round-tripping.

Oh well.  I'll just have to do it unless anyone sees a way around it.

Lincoln


On Thursday 17 April 2003 11:59 am, Ewan Birney wrote:
> On Thu, 17 Apr 2003, Lincoln Stein wrote:
> > Hello,
> >
> > The "sequence dumper" plugin for the Generic Genome Browser has been
> > crashing when making an EMBL dump of a particular region of the worm
> > genome.  The issue is a "Transposon_insertion" feature, which exceeds the
> > 15 character limit for EMBL feature tags.  If I remove the
> > Bio::SeqIO::embl check for this limit, I get an output that looks like
> > this:
> >
> > ...
> > FT   Transposon_insertion complement(13204595..13204596)
> > FT                   /score=""
> > FT                   /group="cxP4108"
> > FT                   /id=7726466
> > FT                   /method="Transposon_insertion"
> > FT                   /source="Allele"
> > FT                   /phase=""
> > FT   repeat          13204572..13204602
> > FT                   /score=80
> > FT                   /group=""
> > FT                   /notes="loop 283"
> > FT                   /id=7775180
> > FT                   /method="repeat"
> > FT                   /source="inverted"
> > FT                   /phase=""
> > FT                   /note="score=80"
> > ...
> >
> > My question is whether this is acceptable embl format?  If not, I will
> > have to truncate feature type names at 15 characters, but this is going
> > to lose information.
>
> Looks like the defn says <15 letters
>
> Feature table components, including feature keys, qualifiers, accession
> numbers, database name abbreviations, feature labels, and location
> operators, are all named following the same conventions. Component names
> may be no more than 20 characters long  (Feature keys 15, Feature
> qualifiers 20)  and must contain at least one letter. While case should
> not be regarded as significant in comparing feature labels ('Prot1' and
> 'pROT1' are the same), the databanks will preserve the case of labels as
> originally annotated. The following characters are permitted to occur in
> feature table component names:
>
>
>
> From:
>
>
> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
>
> > Lincoln
> >
> > --
> > ========================================================================
> > Lincoln D. Stein                           Cold Spring Harbor Laboratory
> > lstein at cshl.org			                  Cold Spring Harbor, NY
> > ========================================================================
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
>
> -----------------------------------------------------------------
> Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> <birney at ebi.ac.uk>.
> -----------------------------------------------------------------

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein at cshl.org			                  Cold Spring Harbor, NY
========================================================================




More information about the Bioperl-l mailing list