[EMBOSS] FW: Forthcoming change in the EMBL flatfile format

Rodrigo Lopez rls at ebi.ac.uk
Wed Apr 26 15:46:51 UTC 2006


 

> -----Original Message-----
> From: owner-seq-dbg at ebi.ac.uk 
> [mailto:owner-seq-dbg at ebi.ac.uk] On Behalf Of Carola Kanz
> Sent: 26 April 2006 16:29
> To: seq-dbg at ebi.ac.uk
> Subject: Forthcoming change in the EMBL flatfile format
> 
> 
> Dear all,
> 
> if you are working with the EMBL flatfile format and you are 
> not yet aware of the format change we are going to introduce 
> with the next release, please have a look at the following 
> announcement.
> Carola
> 
> 
> --------------------------------------------------------------
> -----------
> 
> Dear colleagues,
> 
> We would like to announce the following important change in 
> the EMBL database in June this year.
> 
> At the time of release 87 (available from JUN-2006) the 
> format of the EMBL flat file will undergo a change: the ID 
> line will have a different structure (see below) and the SV 
> line will be removed.
> 
> The changes affecting the ID line structure are:
> 
>      * All tokens will be separated by a semicolon.
>      * The entry name will not be displayed, in its place 
> there will be  
>        the primary accession number.
>      * The sequence version will be indicated.
>      * The topology will be a separate token and will be 
> indicated for 
>        both circular and linear molecules.
>      * Both the data class and the taxonomic divisions will 
> be displayed.
> 
> This is an example of the new ID line:
> 
> ID   CD789012; SV 4; linear; genomic DNA; HTG; MAM; 500 BP.
>         (1)     (2)     (3)      (4)       (5)  (6)   (7)
> 
> 
> The tokens represent:
> 
>     1. Primary accession number.
>     2. 'SV' + sequence version number.
>     3. Topology: 'circular' or 'linear'.
>     4. Molecule type.
>     5. Data class (ANN, CON, PAT, EST, GSS, HTC, HTG, MGA, WGS, TPA, 
>        STS, STD, "normal" entries will have STD for standard).
>     6. Taxonomic division (HUM, MUS, ROD, PRO, MAM, VRT, FUN, 
> PLN, ENV, 
>        INV, SYN, UNC, VRL, PHG)."
>     7. Sequence length + 'BP.'.
> 
> The entry name will not be displayed any more in the ID line. 
> Since EMBL release 3 (Dec 1983) the stable identifier of an 
> entry has been the primary accession number.
> 
> A mapping file (entryname to accession number) will be 
> provided with the next release for those entries where the 
> entryname doesn't coincide with the accession number.
> 
> To give users a test dataset, one file with new-style ID 
> lines called new_id_line.test.gz was provided together with 
> the March release of the EMBL database: 
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/new_id_line.test.gz 
> 
> Feedback from users is sought; please use the "Contact us" 
> link at the bottom of the EBI home page and specify "EMBL" in 
> the feedback form.
> 
> Note: this information was first made available on our 
> "Forthcoming changes" page (
> http://www.ebi.ac.uk/embl/Documentation/forthcomingchanges.htm
> l#0606 ) and in the EMBL database release notes.
> 
> 
> 
> 
> 
> 




More information about the EMBOSS mailing list