[EMBOSS] case sensitive identifiers - Checked by AntiVir DEMO version -
Guy Bottu
gbottu at ben.vub.ac.be
Fri Sep 29 08:15:08 UTC 2006
On Thu, Sep 28, 2006 at 03:32:36PM +0100, Peter Rice wrote:
> For EMBOSS .... well, we could play with the way databases work. Not all
> access methods allow case sensitive searching, but we could fetch all
> entries and try to reject those that do not match. This would need
> something in the EMBOSS id. We already allow modifiers after the id to
> set sequence ranges pdbprot:1fbt_a[1:20] or we could add a qualifier
> -scasesensitive for all sequence inputs.
For the moment our emboss.default contains :
DB pdbprot [ type: P format: fasta comment: 'protein sequences from PDB'
methodquery: app app: "/nfsben/srs/bin/linux73/getz -e '[pdbprot-id:%s]'"
methodall: direct dir: /nfsben/srs/data/blast/dbfb/pdb file: pdb
]
and seqret pdbprot:1ml5_s yields :
>1ml5_S 30S RIBOSOMAL PROTEIN S16
MVKIRLARFGSKHNPHYPHYRIVVTDARRKRDGKYIEKIGYYDPRKTTPDWLKVDVERAR
YWLSVGAQPTDTARRLLRQAGVFRQEAREGA
>1ml5_s 50S RIBOSOMAL PROTEIN L22
MEAKAIARYVRISPRKVRLVVDLIRGKSLEEARNILRYTNKRGAYFVAKVLESAAANAVN
NHDMLEDRLYVKAAYVDEGPALKRVLPRARGRADIIKKRTSHITVILGEKHGK
So, your idea of fetching all entries and then parsing them would work
for SRS. I however think that instead of an associated parameter
-scasesensitive it would be better to have in the emboss.default syntax
for DB entries an optional parameter case:. You should be able to handle
the situation where it is appropriate to pass an id to a case sensitive
search method and the situation where it is appropriate to parse the
output of a case-insensitive search method. This can best be decided for each
databank at EMBOSS site configuartion time, rather than at sequence
retrieval time. What do you think ?
Regards,
Guy Bottu
More information about the EMBOSS
mailing list