[Biojava-l] Extract non-gene regions
Michael Heuer
heuermh at acm.org
Thu Apr 24 04:09:46 UTC 2008
On Thu, 24 Apr 2008, Mark Schreiber wrote:
> Hi Florian -
>
> There are at least two approaches. You are on the right track with
> making a union of all gene locations. The compound location that
> results from the Union will contain all the nucleotides that are
> coding. You can then iterate through each nucleotide in the genome and
> find out if the union contains the nucleotide. If it doesn't then it
> is non coding. This is surprisingly rapid as the comparisons are
> simple. The pseudo code would be something like...
>
> RichLocation coding; //initialize this by making a union of all
> locations of CDS or Gene Features.
>
> RichSequence genome; // read from file or database
>
> for(int i = 1; i <= genome.lenght(); i++){ //you might need to be a
> bit more sophisticated for a circular genome
> if( ! genome.contains(i){
> //you have a non-coding nucleotide.
> }
> }
typo?
if (!coding.contains(i)) {
// you have a non-coding nucleotide.
}
> The other approach is to use the blockIterator() method of the
> compound location that results from the union of coding sequences.
> This will output each contiguous chunk of coding sequence. If you know
> the length of the sequence then you can rapidly figure out the
> intervening pieces.
>
> For example, if the block iterator tells you that [10..50], [90..100],
> [350..380] are coding and you know the genome is of length 400 then
> you can quickly derive [1..9], [51..89], [101..349] and [381..400] are
> non-coding. Again it is more complicated for circular sequences and
> more complex if you consider the opposite strand of a gene (the gene
> shadow) to be non-coding. Unfortunately there is no convenience method
> to do this but if you code something up it would be great to put it in
> the cookbook so others can re-use it.
>
> - Mark
>
> You could actually make point locations of all the non-coding
> nucleotides and then merge the whole lot at the end into a compound
> location of non-coding
>
> On Wed, Apr 23, 2008 at 9:49 PM, Florian Schatz <mail at florianschatz.de> wrote:
> > Hello,
> >
> > I am new to biojava and worked a lot with in the last few weeks. I hope
> > this is the right place for questions, if not please tell me.
> >
> > I want to get the nucleotid sequence outside the genes of a genebank file.
> > So everything that is not marked by a 'gene' feature. Unfortunately, there
> > is no sustract or exclude function for the Location class. Any hints?
> >
> > Btw: union() of location worked fine for extracting nucleotids of the genes
> > only.
> >
> > Best,
> > Florian
> > _______________________________________________
> > Biojava-l mailing list - Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
More information about the Biojava-l
mailing list