[EMBOSS] seqret/entret problems using acc from ensembl-embl
pmr at ebi.ac.uk
Mon Nov 20 15:52:31 UTC 2006
> I think that the format used by Ensembl for assigning IDs and ACCs is
> causing the problems. For example the first entry from the flat file:
> claudia at pc-31-18-86-200:~> head
> ID 1 standard; DNA; HTG; 970768 BP.
> AC chromosome:NCBI36:1:1000001:1970768:1
> SV chromosome:NCBI36:1:1000001:1970768:1
> DT 5-OCT-2006
> DE Homo sapiens chromosome 1 NCBI36 partial sequence 1000001..1970768
> DE annotated by Ensembl
> I tried replacing the ":" character of the AC line with a "_" using sed
> but after indexing and I get the same error message with seqret or
> entret. Is there any length limit for IDs or ACCs in EMBOSS? Is there
> any workaround for this problem?
Those IDs are horrible and not really EMBL format... certainly not valid
We will add an ENSEMBL format for the next release... as a sequence format and
as a format for dbiflat and the (preferred) dbxflat.
Hope that helps,
More information about the EMBOSS