[Biojava-dev] Case-sensitive ProteinSequences

Spencer Bliven sbliven at ucsd.edu
Thu Dec 1 22:28:39 UTC 2011


Thanks Scooter. It looks like my best bet might be with my own
SequenceCreator which stores the case info as a series of Features.

On Tue, Nov 29, 2011 at 18:08, Scooter Willis <HWillis at scripps.edu> wrote:

> Once we load the amino acid sequence we would not maintain the upper case
> or lower case as each amino acid is a static reference to the
> corresponding amino acid compound to save on memory. FastaReader is fairly
> flexible in that you can create your own SequenceCreator that does upper
> case conversion and then you can parse upper lower case and add as a
> feature to the Protein Sequence. Not sure if this solves your problem in
> using the sequence alignment code as I think this returns a new sequence
> that is aligned. If you look in Biojava3-genome module GeneFeatureHelper
> has a method loadFastaAddGeneFeaturesFromUpperCaseExonFastaFile that use
> upper lower case in the fasta file to designate exons as an example.
>
> Thanks
>
> Scooter
>
>
>
> On 11/29/11 8:29 PM, "Spencer Bliven" <sbliven at ucsd.edu> wrote:
>
> >I'm currently trying to read a FASTA file which encodes some information
> >in
> >the case of each amino acid. Specifically, the FASTA contains an alignment
> >where upper case letters are aligned and lower case are unaligned.
> >
> >The first problem I ran into was that lower-case letters are not valid as
> >input to AminoAcidCompoundSet.getCompoundForString(String), which gets
> >called indirectly from the FastaReader. This could be fixed by subclassing
> >AminoAcidCompoundSet and calling toUpper() on the input. However, the
> >second problem is that I need to extract that case information later on.
> >My
> >current solution is a subclass of AminoAcidCompoundSet which contains two
> >copies of each amino acid­one upper and one lower. This seems like a very
> >ugly solution and it breaks all the Alignment algorithms (due to missing
> >amino acids in the scoring matrices). Does anyone have a better
> >suggestion?
> >
> >Thanks,
> >Spencer
> >
> >_______________________________________________
> >biojava-dev mailing list
> >biojava-dev at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>




More information about the biojava-dev mailing list