[EMBOSS] seqret/entret problems using acc from ensembl-embl
Peter Rice
pmr at ebi.ac.uk
Mon Nov 20 15:52:31 UTC 2006
Hi David,
> I think that the format used by Ensembl for assigning IDs and ACCs is
> causing the problems. For example the first entry from the flat file:
>
> claudia at pc-31-18-86-200:~> head
> /local/bioinfo/db/ensembl/embl/Homo_sapiens.0.dat
> ID 1 standard; DNA; HTG; 970768 BP.
> XX
> AC chromosome:NCBI36:1:1000001:1970768:1
> XX
> SV chromosome:NCBI36:1:1000001:1970768:1
> XX
> DT 5-OCT-2006
> XX
> DE Homo sapiens chromosome 1 NCBI36 partial sequence 1000001..1970768
> DE annotated by Ensembl
>
> I tried replacing the ":" character of the AC line with a "_" using sed
> but after indexing and I get the same error message with seqret or
> entret. Is there any length limit for IDs or ACCs in EMBOSS? Is there
> any workaround for this problem?
Those IDs are horrible and not really EMBL format... certainly not valid
accession numbers.
We will add an ENSEMBL format for the next release... as a sequence format and
as a format for dbiflat and the (preferred) dbxflat.
Hope that helps,
Peter
More information about the EMBOSS
mailing list