[Biojava-dev] AbstractSequence

Scooter Willis HWillis at scripps.edu
Tue May 4 01:08:34 UTC 2010


Andy

Trying to finish up the code for the gff parser where we start with a scaffold/dna sequence and by mapping all the various CDS regions we can extract the encoded protein sequence handling negative strand and phase shift.

Each DNASequence can have a collection of genes.
Each gene will have a collection of TranscriptionSequences
Each TranscriptionSequence will have a collection of CDSSequences regions with strand and phase attributes.
>From the CDSSequences owned by the parent GeneSequence we can pull out Intron/Exon sequences by superimposing all CDS regions which will then form an exon region. If not an exon region then the remainder is intron regions.

As it currently stands DNASequence would actually contain the sequence data where you can't create a GeneSequence without passing in a parent DNA sequence. GeneSequence,TranscriptSequence and CDSSequences all extend DNASequence but do not have a reference to backend store but for all modeling purposes they are DNASequences. When I call getSubSequence(begin,end) for the CDS sequence we don't handle the case where we will walk up the parents to find a valid backend store. I should be able to fix it with some minor changes in AbstractSequence and giving AbstractSequence a reference to a possible ParentSequence. 

Before making any changes I wanted to make sure you are all checked in so I don't run into major architectural changes on your end. I will be working on this genome related code for the next 30 days which will help me allocate to getting the core architecture full functional.

Thanks

Scooter



More information about the biojava-dev mailing list