[Biojava-l] sequence masking?

mark.schreiber at novartis.com mark.schreiber at novartis.com
Sun Aug 28 21:56:08 EDT 2005


Hello -

There are not any specific utilities for dealing with this but 
SoftMaskedAlphabet is just a standard biojava alphabet (with some 
reworking of the internals) but can be used as normal. Hence, this should 
work (I've not tested this so let me know if it doesn't).

//get a softmasked version of 'DNA'
FiniteAlphabet alpha = SoftMaskedAlphabet.getInstance(DNATools.getDNA());

//Make a symbol list over that alphabet
SimpleSymbolList syms = new SimpleSymbolList(alpha.getTokenization(), 
"ACCTCGCccccggggccccggggccccggggTTCGA");

//do stuff
...

- Mark





Douglas Hoen <douglas.hoen at mail.mcgill.ca>
Sent by: biojava-l-bounces at portal.open-bio.org
08/29/2005 12:46 AM

 
        To:     biojava-l at biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] sequence masking?


Hi,

I want to mask out DNA subsequences, such as repetitive DNA. I have 
been unable to find any APIs for this. I did find 
SequenceTools.maskSequence(), but this method masks the region 
outside an indicated location rather than inside it and it also uses 
gaps as the mask symbol, whereas I would like to use N or lowercase. 
Another related API is SoftMaskedAlphabet class, which seems useful 
but I can't find any utilities that take advantage of it.

Any help would be appreciated. Thanks,
Doug

_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l





More information about the Biojava-l mailing list