USAs2

Peter Rice pmr at ebi.ac.uk
Fri Jul 2 18:38:02 UTC 2004


Hi Tamas,

Thanks for the suggestion!

It is late on Friday, so I will give it some thought over the weekend.

> I would like to know if it is possible to hack ajax to handle similar USAs 
> listed below and !!!HOW!!!:
> - USA:kw=something, ft=sthelse.
> - USA:SELECT * FROM mytable WHERE..

Yes, it is possible. But still a hack ... which means we have not yet 
implemented it.

This is really an extended query language. I tried to define such 
extensions last year when I moved back to academia, but have not yet had 
time to implement anything.

This is an excellent time to start defining extended USAs.

My plan was:

Start by thinking about the "SRS query language". You can search for 
various "fields":

id (entry ID)
acc (accession number)
sv (sequence version ... and maybe GI number)
des (description)
key (keyword phrase)
org (taxonomy)
... and a few more ...

In SRS, you can use & (and), | (or) ! (but not) to combine search terms

In SRS you can also use > and < to follow links to and from other 
databases. SRS has only one link between any pair of databases - I would 
rather like to use named links so we can choose which links to use.

I would like to allow mulitple databases in the USA. There are some 
problems choosing a good syntax.

I would also like to allow multiple fields - obviously id and acc, or 
combining text fields.

Then, as you suggest, some SQL-like syntax would be nice.

It looks complicated, but we can work in small steps.

In all cases, we need to make this work with "EMBLCD" indexing, with 
reading flatfile data, and with any other indexing system. We can also 
try to make it work with SRS and SRSWWW (easy in some cases, hard in others)

> I see you are working on pattern searches.
> It would be great to have the possibility to define patterns in the 
> fuzzpro by USA: fuzzpro -pattern=USA:patt_name USA:seq
> I think the implementation of this would be useful.
> Return 'value' could be a 'fasta' pattern file:

If I understand correctly, you want to define a file of named patterns, 
and select one using a "USA" syntax.

This is not so simple ... because programs usually want only one type of 
pattern.

However, in ACD we can give the pattern a "knowntype" attribute so 
EMBOSS (and any wrapper) knows what type of pattern is allowed.

We can then use Henrikki Almusa's pattern list to define a file of 
patterns, and some pattern syntax to say which pattern(s) to use.

We do have a problem - we need to make these pattern "USAs" different 
from simple patterns. We also need a name for pattern definitions. I am 
sure we can think of one.

regards,

Peter Rice




More information about the emboss-dev mailing list