[EMBOSS] EMBOSS database queries

Peter Rice pmr at ebi.ac.uk
Wed Dec 12 11:21:51 UTC 2007


Michael Thon wrote:
> I am setting up a database from Genbank formatted files.  I understand 
> how to index the db and configure the emboss.default file but I don't 
> know how to construct the queries.  queries for sequence IDs are pretty 
> simple, i.e. with a USA of the format "dbname:id".  But, how to I create 
> a query for the other fields, such as org and key?  Also, do these 
> fields support wildcards or substring matches or other fancy stuff?

Assuming you indexed all the fields (by default ID and ACC are indexed)
you use the same syntax as in srs (we saw no need to invent a new
syntax, so we used the same field name abbreviations but we did drop the
'[]' around the query :-)

dbname-acc:x13776
dbname-org:pseudomonas*
dbname-des:amidase
dbname-key:
dbname-sv:
dbname-gi:

and, to complete the set, dbname-id:x13776

As you see, wildcards are allowed with '*' at the end.

We can make this much more sophisticated, allowing more wildcard options
and combining queries. So far EMBOSS users have been content to use SRS
or alternatives (MRS for example).

If there is interest, we can extend the USA to include wildcards,
AND/OR/NOT, search multiple fields, combine databases, and if we get
really ambitious we could include links between databases.

We will have to be careful to restrict some of these extensions to
database access methods that support them.

Hope this helps,

Peter



More information about the EMBOSS mailing list