[EMBOSS] EMBOSS database queries
Michael Thon
mike.thon at gmail.com
Wed Dec 12 16:12:05 UTC 2007
Thanks Peter, I got it working.
While I'm at it, a couple more questions popped up:
1) do you know if these indexes compatible with the Bio::DB::Registry
type databases?
2) Is there any way to index and search sequence features?
Best
Mike
On Dec 12, 2007, at 12:21 PM, Peter Rice wrote:
> Michael Thon wrote:
>> I am setting up a database from Genbank formatted files. I
>> understand how to index the db and configure the emboss.default
>> file but I don't know how to construct the queries. queries for
>> sequence IDs are pretty simple, i.e. with a USA of the format
>> "dbname:id". But, how to I create a query for the other fields,
>> such as org and key? Also, do these fields support wildcards or
>> substring matches or other fancy stuff?
>
> Assuming you indexed all the fields (by default ID and ACC are
> indexed)
> you use the same syntax as in srs (we saw no need to invent a new
> syntax, so we used the same field name abbreviations but we did drop
> the
> '[]' around the query :-)
>
> dbname-acc:x13776
> dbname-org:pseudomonas*
> dbname-des:amidase
> dbname-key:
> dbname-sv:
> dbname-gi:
>
> and, to complete the set, dbname-id:x13776
>
> As you see, wildcards are allowed with '*' at the end.
>
> We can make this much more sophisticated, allowing more wildcard
> options
> and combining queries. So far EMBOSS users have been content to use
> SRS
> or alternatives (MRS for example).
>
> If there is interest, we can extend the USA to include wildcards,
> AND/OR/NOT, search multiple fields, combine databases, and if we get
> really ambitious we could include links between databases.
>
> We will have to be careful to restrict some of these extensions to
> database access methods that support them.
>
> Hope this helps,
>
> Peter
More information about the EMBOSS
mailing list