[EMBOSS] case sensitive identifiers - Checked by AntiVir DEMO version -
Guy Bottu
gbottu at ben.vub.ac.be
Thu Sep 28 13:57:40 UTC 2006
Dear colleagues,
Thure Etzold, the developer of SRS, once said "You cannot imagine anything
that crazy or there is at least one database manager who really does it".
While trying to put in our MRS server databanks with the protein and
nucleic acid sequences extracted from the PDB, I bumped on the following
problem : some have identifiers only different by case. E.g. there is a
1fnt_A and a 1fnt_a. Now, most bioinformatic software is not case
sensitive. I understand that MRS stores indices so that they can be
displayed in their original case, but can only be searched
case-insensitively ; it does automatically modify redundant indices, e.g.
1fnt_a is stored as 1fnt_a_12835. This is however not ideal. Should MRS
be adapted so that it can handle case sensitive indices ? This will
however not solve everything, since other software like EMBOSS or GCG is
also case insensitive. My idea is to let the MRS parser store 1fnt_aLC
(LC means lowercase) as identifier. A user can then search for the
sequence he needs in MRS and in EMBOSS (if the EMBOSS installation uses
MRS as databank access mechanism) ask for the sequence pdbprot:1fnt_alc.
This would of course also work with 1fnt_a_12835 but it avoids the use of
a meaningless and irreproducible number. Anybody a comment ?
Regards,
Guy Bottu,
Belgian EMBnet Node
More information about the EMBOSS
mailing list