[Biojava-l] Stranded non-contiguous feature looses its strand in a SubSequence

Matthew Pocock matthew_pocock@yahoo.co.uk
Wed, 22 May 2002 14:10:43 +0100


Hi Stein Aerts,

When you make a sub sequence, the code takes that as a request to only 
tell you about the information in that region. Features that are wholy 
contained within it will be projected in with all their properties 
intact. Features overlapping but not contained will be replaced by 
place-holder features of the type RemoteFeature. RemoteFeature only 
publishes very basic information (those fields in the top-level Feature 
interface) and then provide a method getRemoteFeature() that will return 
the feature they are a slice of (e.g. the original CDS, SOURCE or whatever).

Because of the static typing of Java, it is not possible to make remote 
features implement the exact interface of the feature it stands in for. 
It would be possible via code-generation, and it is on the list of 
things to think about for a future release of BioJava.

Matthew

Stein Aerts wrote:
> Does anyone know why, if you make a subsequence, some of the features that
> are stranded in the original sequence are not stranded anymore in the
> subsequence?
> Would there be a workaround?
> 
> Example: (embl file and the code I used is in attachment)
> 
> features on the original sequence:
> 
> ENSG00000114251: prediction|-|31979, 33725 {([31979,32271]),
> ([33699,33725])}|
> ENSG00000114251: exon|-|[36988,37230]|
> ENSG00000114251: CDS|-|37181, 38568 {([37181,37238]), ([38459,38568])}|
> ENSG00000114251: exon|-|[38459,39310]|
> ENSG00000114251: exon|-|[11752,11941]|
> ENSG00000114251: prediction|+|[68957,69037]|
> ENSG00000114251: prediction|-|[7506,7964]|
> ENSG00000114251: prediction|-|[11751,12043]|
> ENSG00000114251: exon|-|[77834,77918]|
> ENSG00000114251: exon|-|[7510,7965]|
> ENSG00000114251: source|+|[1,82918]|
> ENSG00000114251: exon|-|[32080,32272]|
> ENSG00000114251: prediction|-|[13468,13565]|
> ENSG00000114251: prediction|-|36987, 59254 {([36987,37237]),
> ([38458,38591]), ([42674,42788]), ([43206,43321]), ([45592,45884]),
> ([46589,46685]), ([59243,59254])}|
> ENSG00000114251: prediction|+|73976, 74614 {([73976,74134]),
> ([74468,74614])}|
> ENSG00000114251: exon|-|[36988,37238]|
> ENSG00000114251: exon|-|[5001,7965]|
> ENSG00000114251: CDS|-|7510, 77918 {([7510,7965]), ([11752,11941]),
> ([32080,32272]), ([36988,37230]), ([77834,77918])}|
> ENSG00000114251: exon|-|[11752,12044]|
> ENSG00000114251: prediction|+|16957, 20027 {([16957,17042]),
> ([19913,20027])}|
> 
> 
> features after making a SubSequence of (77718,79918)
> 
> ENSG00000114251: exon|-|[117,201]|
> ENSG00000114251: source|[1,2201]|
> ENSG00000114251: CDS|[117,201]|
> 
> Now the last CDS is not stranded anymore. Could the reason be that this CDS
> has a joined location in the original sequence? Because the exon still has
> its strand.
> 
> Thanks & bye,
> Stein Aerts.
> 
> 
> 
> Ir Stein Aerts
> Bioinformatics Research
> KULeuven, ESAT-SCD
> Kasteelpark Arenberg 10
> 3001 Heverlee, Belgium
> Tel +32-16-32.17.91
> Fax +32-16-32.19.70
> http://www.esat.kuleuven.ac.be/~dna/BioI/
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>