[Biojava-l] DNA letters are lowercase???
hz5@njit.edu
hz5@njit.edu
Mon, 16 Sep 2002 14:46:43 -0400 (EDT)
I replied before, but I didn't see it here. Sorry if duplicated.
Generally there will be no problem with a mix of upper case and lowercase
letters. The only thing I know uses upper case and lower case mixed in one
sequence is a program called RepeatMasker(A.F.A. Smit & P. Green,
http://ftp.genome.washington.edu/RM/webrepeatmaskerhelp.html).
It is a server that searches DNA sequences for interspersed repeats and low
complexity DNA sequences. After screening the sequence, it actually convert
repeats into lowercases therefore the user get a DNA sequence with mixed-cases
letters.
This program is actually used during the Genome annotation process at NCBI.
I suggest to use upper case for DNA sequence, and leave lowercase for user
specific features, the specification can be notified in the FASTA header of the
sequence.
Quoting Matthew Pocock <matthew_pocock@yahoo.co.uk>:
> Ryan Golhar wrote:
> > Can anyone tell me why the the letters for DNA (a,c,t,g) are lowercase
> in
> > DNATools?
>
> Hi Ryan,
>
> The static methods used to retrieve the bases are in lower case. The
> AtomicSymbol instances returned can be spat out as lower or upper case
>
> tepending on the SymbolTokenization you use.
>
> (Someone who knows): does the default tokenization for DNA use upper or
>
> lower case? I don't care either way.
>
> Ryan: To maintain the upper/lower case info in cromatograph files we
> would need to do a little trickery. If you send a file (mixed case) and
>
> a couple of use-cases, we can probably sort this out quickly enough. If
>
> the case is important to you (e.g. you need to know where the uncertain
>
> calls are), we can do this, and if you want to discard this information
>
> then we can also do that trivialy. I'm thinking thoughts like alighment
>
> of DNA against booleans (or 0/1) where A,1 would be A and supported
> (upper case), and T,0 would be T and not well supported (lower case).
>
> Has this already been done?
>
> Matthew
>
> >
> > Some chromatogram files contains a mix of A,C,T,G and some lowercase
> letters
> > for peaks that it could not absolutely determine.
> >
> > Regardless, DNA is always represented with uppercase letters...
> >
> > If there is no argument against it, can this be changed to upper
> case
> > letters instead?
> >
> > Ryan
> >
> > _______________________________________________
> > Biojava-l mailing list - Biojava-l@biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
> >
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
=========================================================
Haibo Zhang, PhD student
Computational Biology, NJIT & Rutgers University
Center for Applied Genomics, PHRI
http://afs13.njit.edu/~hz5