[Biojava-dev] biojava 3 progress
Andy Yates
ayates at ebi.ac.uk
Wed Mar 17 19:24:13 UTC 2010
biojava-genomes sounds good.
I've done nothing since my last check-in of code which was all to do with locations so there should be no problem there :)
On 17 Mar 2010, at 18:17, Scooter Willis wrote:
> Andreas
>
> The problem with putting feature classes in a separate module is that biojava-core sequences would then have a dependency on biojava-feature. A sequence needs to hold a collection of features so feature classes need to go in core. If features are created from gff the core module doesn't care where features come from.
>
> We could go with biojava-genomes and code related to dealing with genomes goes in that module. If you like biojava-genome or biojava-genomes go ahead and create it and email me so I can check it out.
>
> Thanks
>
> Scooter
>
>
>
> On Mar 17, 2010, at 1:46 PM, Andreas Prlic wrote:
>
>> I like biojava-feature as a module name for the GFF and features related code. (should we try to keep the module names singular?) Let me know if you want me to create the module for this...
>> A
>>
>> On Wed, Mar 17, 2010 at 9:09 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>> Andy
>>
>> Let me know if you have any major code changes for the core sequencing handling that have been or could be checked in. So far I haven't needed to touch any of the core sequence code but want to avoid merging code if you have made any significant changes.
>>
>> I should have code to check in today and if we can't come up with a better name I will ask Andreas to create a biojava3-genes module and I can then check that code in for your review. The current problem is that we have ExonSequence extending DNASequence when it could also be described as a feature. One way to look at this that a TranscriptSequence is also a feature of a DNA sequence and only when you want to have a stand alone class with internal links back to parent sequence do you return a TranscriptSequence. The TranscriptFeature would have ExonFeature and IntronFeature as children. You can ask for a ExonSequence based on the ExonFeature. Once you get a ProteinSequence you should be able to reverse the process and get back the TranscriptSequence and the corresponding ExonFeatures and some sort of mapping from a protein sequence position back to the three DNA sequence positions that coded for it. This would need to handle the case where you have a the end of an exon and the start of the next exon coding for a particular amino acid sequence position.
>>
>> We also need to add in the ability to have tracks as a way to group features. This way you export features based on a particular track as a GFF/GFF3 file for importing into various genome browsers. You have one genome you are working on with genes added in from three different gene prediction algorithms each organized by a track. You should then be able to determine overlaps of genes that were predicted and validated via blast against uniprot and create another summary track of validated genes and non-validate genes. If the feature classes we put together can make this easy then I think we will have a solid design.
>>
>>
>> Scooter
>>
>>
>
--
Andrew Yates Ensembl Genomes Engineer
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
More information about the biojava-dev
mailing list