[EMBOSS] Escaping query terms in a USA

Hamish McWilliam hpm at ebi.ac.uk
Fri Aug 23 12:42:24 UTC 2013


Hi David,

> it seems the index is OK, just the database query code can not handle
> the ":" which has special meanings in USAs. So as workaround you can
> replace the ":" by a "*".
>
> entret -stdout -auto 'imgthla-key:A*02*364'
>
> will return the entry HLA08011.
>
> But be aware that by this you actually generate a wildcard query, so
> the * matches any single character at that position.

Unfortunately that is not going to work for this case since the HLA 
alleles use a somewhat nested nomenclature, for example:

     a*01:01:02
     a*01:02
     a*02:01:02
     a*02:101:02

However a little experimentation indicates that EMBOSS supports the 
single character wild-card '?', so something like:

$ entret -stdout -auto 'imgthla-key:A?01?02'

appears to do what I want in most cases.

That said, it would be better to have a way to escape the special 
characters (i.e. '*', ':' and '?') in the search term when an exact 
match is required (as in this case).

Thanks,

Hamish

>
> Kind regards, David.
>
> -----Ursprüngliche Nachricht----- Von:
> emboss-bounces at lists.open-bio.org
> [mailto:emboss-bounces at lists.open-bio.org] Im Auftrag von Hamish
> McWilliam Gesendet: 23 August 2013 11:25 An:
> emboss at lists.open-bio.org Betreff: [EMBOSS] Escaping query terms in a
> USA
>
> Hi folks,
>
> In the IMGT/HLA database (http://www.ebi.ac.uk/ipd/imgt/hla/) the
> keywords field in the EMBL-Bank format flat-file contains allele
> names like:
>
> A*02:364
>
> While I can build an index containing the keywords, it does not
> appear to be possible to search the index with the allele names. For
> example:
>
> $ entret -stdout -auto 'imgthla-key:Allele'
>
> works as expected, but:
>
> $ entret -stdout -auto 'imgthla-key:A*02:364'
>
> just gives errors:
>
> Error: Failed to open filename 'imgthla-key' Error: Unable to read
> sequence 'imgthla-key:A*02:364' Died: entret terminated: Bad value
> for '-sequence' with -auto defined
>
> I am guessing that the problem is the '*' and ':' characters in the
> term... so is there some way to escape these or are the terms in the
> index mangles in some way?
>
> All the best,
>
> Hamish
>


-- 
============================================================
Mr Hamish McWilliam,
Web Production,
European Bioinformatics Institute (EMBL-EBI),
European Molecular Biology Laboratory,
Wellcome Trust Genome Campus,
Hinxton, Cambridge, CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk/
============================================================




More information about the EMBOSS mailing list