[Biojava-l] Behavior of the createRegex() method (MotifTool class)

Matthew Pocock matthew_pocock@yahoo.co.uk
Tue, 3 Dec 2002 11:15:03 +0000 (GMT)


Hi Keith,

You can simply call sym.getMatches().size() == 0, and
if true, it is a symbol with gaps. For all the cases
the regex stuff will be used for any time soon, this
is good enough.

Basicaly, two gaps are the same if they have the same
arrangement of nested [] in their toString(). This is
equivalent to them returning the same list of gap
symbols in their getSymbols() methods. It's a bit
complicated but it's the only way to do it without the
gap rules being totaly different from all other
symbols. I will add docs to package.html in Symbol
explaining gaps and why they are as they are.

Very quickly, the gap [] is the empty set of symbols.
It means 'there is nothing to see here, move on'. [-]
means 'there is nothing here and it takes up one unit
of your sequence'. [-,-] would be a gap in an
alignment of two sequences where both individual
sequences are gapped at that point. You could also
have symbols like [-,a] and [g,-] to represent gaps in
one or the other sequences. For aligning things like
DNA-Protein, you could end up with [-,-,-][-] (a gap
in the codon and in the protein sequence), [a,g,-][-]
(a gap in the protein sequence and a codon with an
insert at pos 3) and so on.

Damn those null sub-spaces and all their spawn.

Matthew

 --- Keith James <kdj@sanger.ac.uk> wrote: > 
> Following up my own post, but anyway... I've made
> the MotifTools test
> pass again. However in trying to eliminate the gap
> character (which
> I'm still not sure I should be doing) I tried to
> test for the gap
> symbol in DNA using mySymbol ==
> AlphabetManager.getGapSymbol() but
> this didn't work (didn't return true for the DNA
> gap).
> 
> Stringifying the DNA gap object I get
> 
> org.biojava.bio.symbol.SimpleBasisSymbol: []
> 
> Can someone clarify the circumstances under which
> the various gap
> Symbol(s) are equal (or ==) to each other?
> 
> ta,
> Keith
> 
> -- 
> 
> - Keith James <kdj@sanger.ac.uk> bioinformatics
> programming support -
> - Pathogen Sequencing Unit, The Wellcome Trust
> Sanger Institute, UK -
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l 

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com