[Biojava-l] Sequence Iteration in BioJava(x)

Richard HOLLAND hollandr at gis.a-star.edu.sg
Thu Dec 15 21:57:15 EST 2005


Mark's comments earlier make my sample code redundant. I had the two
different window thingies confused.

See his post for more details!

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biojava-l-bounces at portal.open-bio.org 
> [mailto:biojava-l-bounces at portal.open-bio.org] On Behalf Of 
> Mark Fortner
> Sent: Friday, December 16, 2005 10:36 AM
> To: biojava-list
> Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x)
> 
> 
> Richard,
> Thanks for the example.  Your approach is very similar to a 
> non-BioJava 
> approach that I had worked out earlier.  I was wondering if the 
> BioJava(x) API provides any performance benefit over simply running a 
> window along a character stream? 
> 
> The work that we're doing involves iterating through the 
> human genome, 
> (and in a number of cases, metagenomic sequences) and we're trying to 
> squeeze as much performance out of it as possible while 
> minimizing the 
> memory footprint.
> 
> Thanks,
> 
> Mark
> 
> Richard HOLLAND wrote:
> 
> >orderNSymbolList splits the sequence into non-overlapping 
> chunks. What
> >is required here is chunks that are only one base different 
> (further on)
> >than the previous chunk.
> >
> >The simplest way would be this:
> >
> >	SymbolList mySeq; // this is your sequence from somewhere else
> >	for (int i = 1 ; i <= mySeq.length()-2; i++) {
> >		SymbolList trimer = mySeq.subSeq(i,i+2); // coords are
> >inclusive so i to i+2 = 3 bases
> >		// do something with your trimer here
> >	}
> >
> >Note that the index starts at 1 and goes right up to and including
> >length(), as symbols in a SymbolList are 1-indexed, not 0-indexed.
> >	
> >cheers,
> >Richard
> >
> >Richard Holland
> >Bioinformatics Specialist
> >GIS extension 8199
> >---------------------------------------------
> >This email is confidential and may be privileged. If you are not the
> >intended recipient, please delete it and notify us 
> immediately. Please
> >do not copy or use it for any purpose, or disclose its content to any
> >other person. Thank you.
> >---------------------------------------------
> >
> >
> >  
> >
> >>-----Original Message-----
> >>From: biojava-l-bounces at portal.open-bio.org 
> >>[mailto:biojava-l-bounces at portal.open-bio.org] On Behalf Of 
> David Huen
> >>Sent: Friday, December 16, 2005 7:34 AM
> >>To: m.fortner at sbcglobal.net
> >>Cc: biojava-list
> >>Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x)
> >>
> >>
> >>On Dec 15 2005, Mark Fortner wrote:
> >>I think what you want is the 
> SymbolListViews.orderNSymbolList method.
> >>
> >>It will take a SymbolList and turn it into another where it 
> >>is viewed in 
> >>another compound alphabet of the required order.
> >>
> >>
> >>    
> >>
> >>>I'm looking for the best way to iterate through all
> >>>nmers within a given sequence.  For example, given a
> >>>sequence that looks like this:
> >>>
> >>>ACTGACTGACTG
> >>>
> >>>If I extract all trimers from this I should get:
> >>>
> >>>ACT
> >>>CTG
> >>>TGA
> >>>GAC
> >>>ACT
> >>>CTG
> >>>TGA
> >>>GAC
> >>>ACT
> >>>CTG
> >>>
> >>>Is there an existing class that will allow me to
> >>>iterate through a sequence this way, or am I on my
> >>>own?
> >>>
> >>>      
> >>>
> >>_______________________________________________
> >>Biojava-l mailing list  -  Biojava-l at biojava.org
> >>http://biojava.org/mailman/listinfo/biojava-l
> >>
> >>    
> >>
> >
> >  
> >
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 



More information about the Biojava-l mailing list