[EMBOSS] EMBOSS database queries

Michael Thon mike.thon at gmail.com
Wed Dec 12 16:12:05 UTC 2007


Thanks Peter, I got it working.
While I'm at it, a couple more questions popped up:
1) do you know if  these indexes compatible with the Bio::DB::Registry  
type databases?
2) Is there any way to index and search sequence features?
Best
Mike


On Dec 12, 2007, at 12:21 PM, Peter Rice wrote:

> Michael Thon wrote:
>> I am setting up a database from Genbank formatted files.  I  
>> understand how to index the db and configure the emboss.default  
>> file but I don't know how to construct the queries.  queries for  
>> sequence IDs are pretty simple, i.e. with a USA of the format  
>> "dbname:id".  But, how to I create a query for the other fields,  
>> such as org and key?  Also, do these fields support wildcards or  
>> substring matches or other fancy stuff?
>
> Assuming you indexed all the fields (by default ID and ACC are  
> indexed)
> you use the same syntax as in srs (we saw no need to invent a new
> syntax, so we used the same field name abbreviations but we did drop  
> the
> '[]' around the query :-)
>
> dbname-acc:x13776
> dbname-org:pseudomonas*
> dbname-des:amidase
> dbname-key:
> dbname-sv:
> dbname-gi:
>
> and, to complete the set, dbname-id:x13776
>
> As you see, wildcards are allowed with '*' at the end.
>
> We can make this much more sophisticated, allowing more wildcard  
> options
> and combining queries. So far EMBOSS users have been content to use  
> SRS
> or alternatives (MRS for example).
>
> If there is interest, we can extend the USA to include wildcards,
> AND/OR/NOT, search multiple fields, combine databases, and if we get
> really ambitious we could include links between databases.
>
> We will have to be careful to restrict some of these extensions to
> database access methods that support them.
>
> Hope this helps,
>
> Peter




More information about the EMBOSS mailing list