[Biojava-l] case-sensitive sequences

Mark Schreiber markjschreiber at gmail.com
Wed Feb 28 02:54:57 UTC 2007


Hi -

There are also the classes: SoftMaskedAlphabet and
SoftMaskedAlphabet.CaseSensitiveTokenization and
SoftMaskedAlphabet.MaskingDetector. Together these classes let you
read a sequence that contains case sensitive information and (if you
wish) make use of that information. You can also write out the
sequence in the original case sensitive format.

It was originally designed for reading data that had been 'softmasked'
for low complexity regions (eg lower case regions are low complexity
and would be ignored in subsequent analysis) but it would be used for
quality or any other distinction.

- Mark

On 2/28/07, ilhami visne <ilhami.visne at gmail.com> wrote:
> Thank you for quick answer. Here is the part of my code:
>
> BufferedReader br = new BufferedReader(new FileReader("seq.fasta"));
> RichSequenceIterator iter = RichSequence.IOTools.readFastaDNA(br,null);
> RichSequence rs = iter.nextRichSequence();
>
> Richard Holland wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > DNA is not case-sensitive. What I suspect you are parsing is the output
> > of some sequencing software which is using case as a rough indicator of
> > base calling quality?
> >
> > The case will have been lost when the file was parsed, not at the moment
> > you iterate over the resulting sequences. This means that you have to
> > modify your file parsing method to become case-sensitive.
> >
> > The default DNA alphabet is not case-sensitive. It makes no distinction
> > between the two, and will convert everything to one case.
> >
> > If you need to preserve case, you will need to use a custom alphabet
> > which treats the cases differently, and also specify a tokenizer which
> > is case-sensitive. See the help pages at http://biojava.org/ for help on
> > creating new alphabets. Or, have a look at the ABITools.QUALITY alphabet
> > in BioJava, which interprets the case and stores the quality scores
> > separately.
> >
> > Note however that your custom alphabet is NOT the same as the original
> > DNA alphabet, and so you may not be able to use it in all the standard
> > transforms (RNA etc.). If you do want to use these then you will have to
> > make a second copy of each sequence using the normal DNA alphabet and
> > pass that copy to the routines.
> >
> > If you post to this list the code you are using to read the file, then I
> > can show you where to insert the reference to this new alphabet.
> >
> > cheers,
> > Richard
> >
> > Ilhami Visne wrote:
> >
> >> my sequence files contain case-sensitive symbols (TAATAACgagagg) and i am
> >> using now RichSequenceIterator to iterate over the sequences.
> >>
> >> How can i tell biojava that it should parse it case-sensitive? if i call
> >> seq.seqString() method, it should return exactly like it was in the file
> >> with upper- and lower-case.
> >>
> >> thanx.
> >> _______________________________________________
> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>
> >>
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.2.2 (GNU/Linux)
> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >
> > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv
> > uZKlrdE8y6vMfKcOlm9yBZA=
> > =2VZC
> > -----END PGP SIGNATURE-----
> >
> >
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



More information about the Biojava-l mailing list