[Biojava-l] case-sensitive sequences

Richard Holland holland at ebi.ac.uk
Wed Oct 10 08:06:16 UTC 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You can use SoftMaskedAlphabet with the BioJavaX parsers to get the
desired effect.

By default, a soft masked character is one in lower case. The code below
will detect these. If you have other search criteria you can modify the
soft masked detection criteria to match this instead. To do that, add a
second parameter to the call to SoftMaskedAlphabet.getInstance() and use
it to pass in an instance of SoftMaskedAlphabet.MaskingDetector (see the
JavaDocs to see how this should work).

Hope this helps! :



// Set up a soft-masked alphabet.
SoftMaskedAlphabet sma =
	SoftMaskedAlphabet.getInstance(DNATools.getDNA());
SymbolTokenization stok = sma.getTokenization("token");

// Set up sequence parsing.
BufferedReader input = ....;
	// Get your sequences from somewhere
RichSequenceFormat format = new FastaFormat();
	// Or Genbank etc.
RichSequenceBuilderFactory factory = RichSequenceBuilderFactory.FACTORY;
	// See Javadocs for alternative factories.
Namespace ns = RichObjectFactory.getDefaultNamespace();
	// See Javadocs for alternative namespaces.

// Parse the sequences.
RichStreamReader seqsIn =
	new RichStreamReader(input, format,  stok, factory, ns);


// Find the soft-masked symbols in the sequences.
while (seqsIn.hasNext()) {
  RichSequence seq = seqsIn.nextRichSequence();

  // Iterate over symbols in sequence.
  for (Iterator i = seq.iterator(); i.hasNext(); ) {

     Symbol sym = (Symbol)i.next();

     // Is this symbol masked?
     if (sma.isMasked(sym)) {
        // Yes it is so deal with it.
        .......
     } else {
        // No it isn't, so deal with that instead.
        .......
     }
  }
}


cheers,
Richard

vineith kaul wrote:
> Hi,
> 
> I want to read in a sequence which has case sensitive
> alphabets(nucleotides).Basically I want to replace only small
> 'a,g,t,c' with blanks .Although I saw a similar post earlier but
> couldn't understand much.Can someone help me with this ?
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHDIf44C5LeMEKA/QRAmuNAJ426M/UgInqDG5rG6w+F+qoMdVzPQCfZo1S
nAS5v8jSFBX5WCuB5UmzczQ=
=Sicc
-----END PGP SIGNATURE-----



More information about the Biojava-l mailing list