[Biojava-l] case-sensitive sequences

Wed Feb 28 11:03:24 UTC 2007

Hi -

Is there any reason why you need to be running the restriction finder
over the soft masked sequence?

Can you post some example code to replicate the bug/annoyance?

If you think this is a genuine bug then please submit a biojava bug
report to http://bugzilla.open-bio.org/
Please also include the example code that demonstrates the bug.

Thanks.

- Mark

On 2/28/07, Ilhami Visne <ilhami.visne at gmail.com> wrote:
> i've changed my code and called the RestrictionSiteFinder with the new
> sequence. it's throwed this exception.
>
> Exception in thread "Thread-25"
> java.lang.UnsupportedOperationException: Ambiguity should be handled
> at the level of the wrapped Alphabet
>         at org.biojava.bio.symbol.SoftMaskedAlphabet.getAmbiguity(SoftMaskedAlphabet.java:183)
>         at org.biojava.bio.symbol.AlphabetManager.getAllSymbols(AlphabetManager.java:223)
>         at org.biojava.bio.seq.io.SymbolListCharSequence.<init>(SymbolListCharSequence.java:75)
>         at org.biojava.bio.molbio.RestrictionSiteFinder.run(RestrictionSiteFinder.java:73)
>         at org.biojava.utils.SimpleThreadPool$PooledThread.run(SimpleThreadPool.java:295)
>
> i understand why it didn't work (lower case symbol 'a' and upper
> symbol 'A'), but i can't find a solution. Any idea?
>
> On 2/28/07, ilhami visne <ilhami.visne at gmail.com> wrote:
> > Thank you. it does now. i should able to find it myself, but i am really
> > not a bioinformaticians yet.
> >
> > my code (maybe there is someone, who has the same problem like me)
> >
> > BufferedReader br = new BufferedReader(new FileReader("seq.fasta"));
> >
> > Alphabet dna = SoftMaskedAlphabet.getInstance(DNATools.getDNA());
> > SymbolTokenization dnaParser = dna.getTokenization("token");
> >
> > RichSequenceIterator iter =
> > RichSequence.IOTools.readFasta(br,dnaParser,null);
> > RichSequence rs = iter.nextRichSequence();
> >
> > Mark Schreiber wrote:
> > > Hi -
> > >
> > > There are also the classes: SoftMaskedAlphabet and
> > > SoftMaskedAlphabet.CaseSensitiveTokenization and
> > > SoftMaskedAlphabet.MaskingDetector. Together these classes let you
> > > read a sequence that contains case sensitive information and (if you
> > > wish) make use of that information. You can also write out the
> > > sequence in the original case sensitive format.
> > >
> > > It was originally designed for reading data that had been 'softmasked'
> > > for low complexity regions (eg lower case regions are low complexity
> > > and would be ignored in subsequent analysis) but it would be used for
> > > quality or any other distinction.
> > >
> > > - Mark
> > >
> > > On 2/28/07, ilhami visne <ilhami.visne at gmail.com> wrote:
> > >> Thank you for quick answer. Here is the part of my code:
> > >>
> > >> BufferedReader br = new BufferedReader(new FileReader("seq.fasta"));
> > >> RichSequenceIterator iter = RichSequence.IOTools.readFastaDNA(br,null);
> > >> RichSequence rs = iter.nextRichSequence();
> > >>
> > >> Richard Holland wrote:
> > >> > -----BEGIN PGP SIGNED MESSAGE-----
> > >> > Hash: SHA1
> > >> >
> > >> > DNA is not case-sensitive. What I suspect you are parsing is the
> > >> output
> > >> > of some sequencing software which is using case as a rough
> > >> indicator of
> > >> > base calling quality?
> > >> >
> > >> > The case will have been lost when the file was parsed, not at the
> > >> moment
> > >> > you iterate over the resulting sequences. This means that you have to
> > >> > modify your file parsing method to become case-sensitive.
> > >> >
> > >> > The default DNA alphabet is not case-sensitive. It makes no
> > >> distinction
> > >> > between the two, and will convert everything to one case.
> > >> >
> > >> > If you need to preserve case, you will need to use a custom alphabet
> > >> > which treats the cases differently, and also specify a tokenizer which
> > >> > is case-sensitive. See the help pages at http://biojava.org/ for
> > >> help on
> > >> > creating new alphabets. Or, have a look at the ABITools.QUALITY
> > >> alphabet
> > >> > in BioJava, which interprets the case and stores the quality scores
> > >> > separately.
> > >> >
> > >> > Note however that your custom alphabet is NOT the same as the original
> > >> > DNA alphabet, and so you may not be able to use it in all the standard
> > >> > transforms (RNA etc.). If you do want to use these then you will
> > >> have to
> > >> > make a second copy of each sequence using the normal DNA alphabet and
> > >> > pass that copy to the routines.
> > >> >
> > >> > If you post to this list the code you are using to read the file,
> > >> then I
> > >> > can show you where to insert the reference to this new alphabet.
> > >> >
> > >> > cheers,
> > >> > Richard
> > >> >
> > >> > Ilhami Visne wrote:
> > >> >
> > >> >> my sequence files contain case-sensitive symbols (TAATAACgagagg)
> > >> and i am
> > >> >> using now RichSequenceIterator to iterate over the sequences.
> > >> >>
> > >> >> How can i tell biojava that it should parse it case-sensitive? if
> > >> i call
> > >> >> seq.seqString() method, it should return exactly like it was in
> > >> the file
> > >> >> with upper- and lower-case.
> > >> >>
> > >> >> thanx.
> > >> >> _______________________________________________
> > >> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> > >> >>
> > >> >>
> > >> > -----BEGIN PGP SIGNATURE-----
> > >> > Version: GnuPG v1.4.2.2 (GNU/Linux)
> > >> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> > >> >
> > >> > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv
> > >> > uZKlrdE8y6vMfKcOlm9yBZA=
> > >> > =2VZC
> > >> > -----END PGP SIGNATURE-----
> > >> >
> > >> >
> > >>
> > >> _______________________________________________
> > >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> > >>
> > >
> >
> >
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>