[Biojava-l] case-sensitive sequences

ilhami visne ilhami.visne at gmail.com
Wed Feb 28 05:15:53 UTC 2007


Thank you. it does now. i should able to find it myself, but i am really 
not a bioinformaticians yet.

my code (maybe there is someone, who has the same problem like me)

BufferedReader br = new BufferedReader(new FileReader("seq.fasta"));
           
Alphabet dna = SoftMaskedAlphabet.getInstance(DNATools.getDNA());
SymbolTokenization dnaParser = dna.getTokenization("token");
           
RichSequenceIterator iter = 
RichSequence.IOTools.readFasta(br,dnaParser,null);
RichSequence rs = iter.nextRichSequence();

Mark Schreiber wrote:
> Hi -
>
> There are also the classes: SoftMaskedAlphabet and
> SoftMaskedAlphabet.CaseSensitiveTokenization and
> SoftMaskedAlphabet.MaskingDetector. Together these classes let you
> read a sequence that contains case sensitive information and (if you
> wish) make use of that information. You can also write out the
> sequence in the original case sensitive format.
>
> It was originally designed for reading data that had been 'softmasked'
> for low complexity regions (eg lower case regions are low complexity
> and would be ignored in subsequent analysis) but it would be used for
> quality or any other distinction.
>
> - Mark
>
> On 2/28/07, ilhami visne <ilhami.visne at gmail.com> wrote:
>> Thank you for quick answer. Here is the part of my code:
>>
>> BufferedReader br = new BufferedReader(new FileReader("seq.fasta"));
>> RichSequenceIterator iter = RichSequence.IOTools.readFastaDNA(br,null);
>> RichSequence rs = iter.nextRichSequence();
>>
>> Richard Holland wrote:
>> > -----BEGIN PGP SIGNED MESSAGE-----
>> > Hash: SHA1
>> >
>> > DNA is not case-sensitive. What I suspect you are parsing is the 
>> output
>> > of some sequencing software which is using case as a rough 
>> indicator of
>> > base calling quality?
>> >
>> > The case will have been lost when the file was parsed, not at the 
>> moment
>> > you iterate over the resulting sequences. This means that you have to
>> > modify your file parsing method to become case-sensitive.
>> >
>> > The default DNA alphabet is not case-sensitive. It makes no 
>> distinction
>> > between the two, and will convert everything to one case.
>> >
>> > If you need to preserve case, you will need to use a custom alphabet
>> > which treats the cases differently, and also specify a tokenizer which
>> > is case-sensitive. See the help pages at http://biojava.org/ for 
>> help on
>> > creating new alphabets. Or, have a look at the ABITools.QUALITY 
>> alphabet
>> > in BioJava, which interprets the case and stores the quality scores
>> > separately.
>> >
>> > Note however that your custom alphabet is NOT the same as the original
>> > DNA alphabet, and so you may not be able to use it in all the standard
>> > transforms (RNA etc.). If you do want to use these then you will 
>> have to
>> > make a second copy of each sequence using the normal DNA alphabet and
>> > pass that copy to the routines.
>> >
>> > If you post to this list the code you are using to read the file, 
>> then I
>> > can show you where to insert the reference to this new alphabet.
>> >
>> > cheers,
>> > Richard
>> >
>> > Ilhami Visne wrote:
>> >
>> >> my sequence files contain case-sensitive symbols (TAATAACgagagg) 
>> and i am
>> >> using now RichSequenceIterator to iterate over the sequences.
>> >>
>> >> How can i tell biojava that it should parse it case-sensitive? if 
>> i call
>> >> seq.seqString() method, it should return exactly like it was in 
>> the file
>> >> with upper- and lower-case.
>> >>
>> >> thanx.
>> >> _______________________________________________
>> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> >>
>> >>
>> > -----BEGIN PGP SIGNATURE-----
>> > Version: GnuPG v1.4.2.2 (GNU/Linux)
>> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>> >
>> > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv
>> > uZKlrdE8y6vMfKcOlm9yBZA=
>> > =2VZC
>> > -----END PGP SIGNATURE-----
>> >
>> >
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>




More information about the Biojava-l mailing list