[EMBOSS] case sensitive identifiers - Checked by AntiVir DEMO version -

Fri Sep 29 08:15:08 UTC 2006

On Thu, Sep 28, 2006 at 03:32:36PM +0100, Peter Rice wrote:
> For EMBOSS .... well, we could play with the way databases work. Not all 
> access methods allow case sensitive searching, but we could fetch all 
> entries and try to reject those that do not match. This would need 
> something in the EMBOSS id. We already allow modifiers after the id to 
> set sequence ranges pdbprot:1fbt_a[1:20] or we could add a qualifier 
> -scasesensitive for all sequence inputs.

For the moment our emboss.default contains :

DB pdbprot [ type: P format: fasta comment: 'protein sequences from PDB'
     methodquery: app app: "/nfsben/srs/bin/linux73/getz -e '[pdbprot-id:%s]'"
             methodall: direct dir: /nfsben/srs/data/blast/dbfb/pdb file: pdb
]

and seqret pdbprot:1ml5_s yields :

>1ml5_S 30S RIBOSOMAL PROTEIN S16
MVKIRLARFGSKHNPHYPHYRIVVTDARRKRDGKYIEKIGYYDPRKTTPDWLKVDVERAR
YWLSVGAQPTDTARRLLRQAGVFRQEAREGA
>1ml5_s 50S RIBOSOMAL PROTEIN L22
MEAKAIARYVRISPRKVRLVVDLIRGKSLEEARNILRYTNKRGAYFVAKVLESAAANAVN
NHDMLEDRLYVKAAYVDEGPALKRVLPRARGRADIIKKRTSHITVILGEKHGK

So, your idea of fetching all entries and then parsing them would work 
for SRS. I however think that instead of an associated parameter  
-scasesensitive it would be better to have in the emboss.default syntax 
for DB entries an optional parameter case:. You should be able to handle 
the situation where it is appropriate to pass an id to a case sensitive 
search method and the situation where it is appropriate to parse the 
output of a case-insensitive search method. This can best be decided for each 
databank at EMBOSS site configuartion time, rather than at sequence 
retrieval time. What do you think ?

	Regards,
	Guy Bottu