[Biojava-l] DNA letters are lowercase???

hz5@njit.edu hz5@njit.edu
Mon, 16 Sep 2002 14:46:43 -0400 (EDT)


I replied before, but I didn't see it here. Sorry if duplicated.

Generally there will be no problem with a mix of upper case and lowercase 
letters. The only thing I know uses upper case and lower case mixed in one 
sequence is a program called RepeatMasker(A.F.A. Smit & P. Green, 
http://ftp.genome.washington.edu/RM/webrepeatmaskerhelp.html).
It is a server that searches DNA sequences for interspersed repeats and low 
complexity DNA sequences. After screening the sequence, it actually convert 
repeats into lowercases therefore the user get a DNA sequence with mixed-cases 
letters.

This program is actually used during the Genome annotation process at NCBI.

I suggest to use upper case for DNA sequence, and leave lowercase for user 
specific features, the specification can be notified in the FASTA header of the 
sequence.

Quoting Matthew Pocock <matthew_pocock@yahoo.co.uk>:

> Ryan Golhar wrote:
> > Can anyone tell me why the the letters for DNA (a,c,t,g) are lowercase
> in
> > DNATools?
> 
> Hi Ryan,
> 
> The static methods used to retrieve the bases are in lower case. The 
> AtomicSymbol instances returned can be spat out as lower or upper case
> 
> tepending on the SymbolTokenization you use.
> 
> (Someone who knows): does the default tokenization for DNA use upper or
> 
> lower case? I don't care either way.
> 
> Ryan: To maintain the upper/lower case info in cromatograph files we 
> would need to do a little trickery. If you send a file (mixed case) and
> 
> a couple of use-cases, we can probably sort this out quickly enough. If
> 
> the case is important to you (e.g. you need to know where the uncertain
> 
> calls are), we can do this, and if you want to discard this information
> 
> then we can also do that trivialy. I'm thinking thoughts like alighment
> 
> of DNA against booleans (or 0/1) where A,1 would be A and supported 
> (upper case), and T,0 would be T and not well supported (lower case).
> 
> Has this already been done?
> 
> Matthew
> 
> > 
> > Some chromatogram files contains a mix of A,C,T,G and some lowercase
> letters
> > for peaks that it could not absolutely determine.
> > 
> > Regardless, DNA is always represented with uppercase letters...
> > 
> > If there is no argument against it, can this be changed to upper
> case
> > letters instead?
> > 
> > Ryan
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l@biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
> > 
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 



=========================================================
Haibo Zhang, PhD student
Computational Biology, NJIT & Rutgers University
Center for Applied Genomics, PHRI
http://afs13.njit.edu/~hz5