[Biojava-l] Behavior of the createRegex() method (MotifTool class)

Keith James kdj@sanger.ac.uk
02 Dec 2002 16:04:40 +0000


>>>>> "Thomas" == Thomas Down <td2@sanger.ac.uk> writes:

[...]

    Thomas> It's fixed for now (see AlphabetManagerTest).  If we ever
    Thomas> re-write this code we *must* use some kind of globally
    Thomas> unique names (URLs?)  for Symbols and Alphabets, otherwise
    Thomas> we'll end up in this mess again.

Thanks.

One final thing... the docs say "AtomicSymbol instances guarantee that
getMatches returns an Alphabet containing just that Symbol". But the
gap Symbol is returned by this method in every case. Or is the gap
Symbol specially "non-existent"? Elsewhere it states "every alphabet
contains gap, as there is no symbol that matches gap, so there is no
case where an alphabet doesn't contain a symbol that matches gap". I'm
just not sure what this means!

In short, what is the correct behaviour? I ask because currently the
SymbolList "acgt" generates the regex "[-a][-g][-c][-t]" under this
system.

cheers,

Keith

-- 

- Keith James <kdj@sanger.ac.uk> bioinformatics programming support -
- Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, UK -