[EMBOSS] seqret/entret problems using acc from ensembl-embl

Peter Rice pmr at ebi.ac.uk
Mon Nov 20 15:52:31 UTC 2006


Hi David,

> I think that the format used by Ensembl for assigning IDs and ACCs is
> causing the problems. For example the first entry from the flat file:
> 
> claudia at pc-31-18-86-200:~> head
> /local/bioinfo/db/ensembl/embl/Homo_sapiens.0.dat
> ID   1    standard; DNA; HTG; 970768 BP.
> XX
> AC   chromosome:NCBI36:1:1000001:1970768:1
> XX
> SV   chromosome:NCBI36:1:1000001:1970768:1
> XX
> DT   5-OCT-2006
> XX
> DE   Homo sapiens chromosome 1 NCBI36 partial sequence 1000001..1970768
> DE   annotated by Ensembl
> 
> I tried replacing the ":" character of the AC line with a "_" using sed
> but after indexing and I get the same error message with seqret or
> entret. Is there any length limit for IDs or ACCs in EMBOSS? Is there
> any workaround for this problem?

Those IDs are horrible and not really EMBL format... certainly not valid 
accession numbers.

We will add an ENSEMBL format for the next release... as a sequence format and 
as a format for dbiflat and the (preferred) dbxflat.

Hope that helps,

Peter



More information about the EMBOSS mailing list