[EMBOSS] EMBOSS database queries
Peter Rice
pmr at ebi.ac.uk
Wed Dec 12 11:21:51 UTC 2007
Michael Thon wrote:
> I am setting up a database from Genbank formatted files. I understand
> how to index the db and configure the emboss.default file but I don't
> know how to construct the queries. queries for sequence IDs are pretty
> simple, i.e. with a USA of the format "dbname:id". But, how to I create
> a query for the other fields, such as org and key? Also, do these
> fields support wildcards or substring matches or other fancy stuff?
Assuming you indexed all the fields (by default ID and ACC are indexed)
you use the same syntax as in srs (we saw no need to invent a new
syntax, so we used the same field name abbreviations but we did drop the
'[]' around the query :-)
dbname-acc:x13776
dbname-org:pseudomonas*
dbname-des:amidase
dbname-key:
dbname-sv:
dbname-gi:
and, to complete the set, dbname-id:x13776
As you see, wildcards are allowed with '*' at the end.
We can make this much more sophisticated, allowing more wildcard options
and combining queries. So far EMBOSS users have been content to use SRS
or alternatives (MRS for example).
If there is interest, we can extend the USA to include wildcards,
AND/OR/NOT, search multiple fields, combine databases, and if we get
really ambitious we could include links between databases.
We will have to be careful to restrict some of these extensions to
database access methods that support them.
Hope this helps,
Peter
More information about the EMBOSS
mailing list