[EMBOSS] Indexing the ID field of EMBL-formatted databases.

Peter Rice pmr at ebi.ac.uk
Wed May 30 13:33:11 UTC 2007


Charles Plessy wrote:
> I have tried to index mirbase, the miRNA database, with dbiflat, but in that
> case I can only retrieve the seqences by their accession numbers, and not by
> their IDs:
> 

> gslc12.mirbase3.$ seqret mirbase:mmu-mir-690 stdout
> Reads and writes (returns) sequences
> Error: Unable to read sequence 'mirbase:mmu-mir-690'
> Died: seqret terminated: Bad value for '-sequence' and no prompt

Aha ... mirbase is in EMBL format .. except the IDs are in lower case. All other 
EMBL/UniProt databases are in upper case.

in emboss/dbiflat.c function dbiflat_ParseEmbl, add a conversion to upper case:

	if(lineType == FLATTYPE_ID)
	{
	    ajRegExec(regEmblId, rline);
	    ajRegSubI(regEmblId, 1, myid);
	    ajStrFmtUpper(&myid);
	    ajDebug("++id '%S'\n", *myid);
	    ajRegSubI(regEmblId, 3, &tmpfd);


(the ajStrFmtUpper line).

Will be included in the July release.

regards,

Peter



More information about the EMBOSS mailing list