[EMBOSS] Indexing the ID field of EMBL-formatted databases.
Peter Rice
pmr at ebi.ac.uk
Wed May 30 13:33:11 UTC 2007
Charles Plessy wrote:
> I have tried to index mirbase, the miRNA database, with dbiflat, but in that
> case I can only retrieve the seqences by their accession numbers, and not by
> their IDs:
>
> gslc12.mirbase3.$ seqret mirbase:mmu-mir-690 stdout
> Reads and writes (returns) sequences
> Error: Unable to read sequence 'mirbase:mmu-mir-690'
> Died: seqret terminated: Bad value for '-sequence' and no prompt
Aha ... mirbase is in EMBL format .. except the IDs are in lower case. All other
EMBL/UniProt databases are in upper case.
in emboss/dbiflat.c function dbiflat_ParseEmbl, add a conversion to upper case:
if(lineType == FLATTYPE_ID)
{
ajRegExec(regEmblId, rline);
ajRegSubI(regEmblId, 1, myid);
ajStrFmtUpper(&myid);
ajDebug("++id '%S'\n", *myid);
ajRegSubI(regEmblId, 3, &tmpfd);
(the ajStrFmtUpper line).
Will be included in the July release.
regards,
Peter
More information about the EMBOSS
mailing list