Sequence formats (was: Staden format (again))

Peter Rice pmr at ebi.ac.uk
Thu Sep 18 10:11:28 UTC 2003


Guy Bottu wrote:
> from : BEN

(Belgian EMBnet Node for the uninitiated :-)
  	
> Excuse me if I am beating a dead horse. I took a look at EMBOSS 2.7.1 and I saw 
> that Staden and experiment format are still not handled correctly.
> 
> Staden format : actually obsolete, the latest version of the Staden package does 
> not support it anymore. Staden format is a just the sequence in simple text 
> with, optionally, comments  <xxxxx>  at any position in the sequence. When 
> EMBOSS reads in "staden" format, it recognizes only a comment at the top of the 
> sequence but considers comments inside the sequence as part of the sequence.

I'll take a look. It is a little tricky. GCG use the same comments (does 
anyone still use them? you could insert them using the seqed editor) so 
the GCG format needs the same change. Basically, in the sequence 
anything after < is a comment until a > is found, possibly on a later line.

This reminds me of a sequence I once saw that GCG failed to read - 
because it had been cut and pasted in an email, and had a '<' character 
at the start of a line so the rest was commented out :-)

> Staden experiment format : is used by Staden for assembly. Looks like EMBL 
> format, but with a lot of extra fields related to the DNA sequencing, and only  
> one sequence per file is allowed, there can be fields below the "//". EMBOSS 
> erroneously uses "experiment" as synonym for "staden". You can find more 
> information about "experiment" format at
> http://www.mrc-lmb.cam.ac.uk/pubseq/manual/formats_unix_18.html

Thanks. I will take a look.

Does anyone have real examples of these formats?

Does anyone have real examples of any other formats EMBOSS supports?

Does anyone have real examples of any formats EMBOSS dioes not (yet) 
support?

regards,

Peter Rice






More information about the emboss-dev mailing list