[Biojava-l] Sequence Iteration in BioJava(x)

Mark Fortner m.fortner at sbcglobal.net
Thu Dec 15 21:36:11 EST 2005


Richard,
Thanks for the example.  Your approach is very similar to a non-BioJava 
approach that I had worked out earlier.  I was wondering if the 
BioJava(x) API provides any performance benefit over simply running a 
window along a character stream? 

The work that we're doing involves iterating through the human genome, 
(and in a number of cases, metagenomic sequences) and we're trying to 
squeeze as much performance out of it as possible while minimizing the 
memory footprint.

Thanks,

Mark

Richard HOLLAND wrote:

>orderNSymbolList splits the sequence into non-overlapping chunks. What
>is required here is chunks that are only one base different (further on)
>than the previous chunk.
>
>The simplest way would be this:
>
>	SymbolList mySeq; // this is your sequence from somewhere else
>	for (int i = 1 ; i <= mySeq.length()-2; i++) {
>		SymbolList trimer = mySeq.subSeq(i,i+2); // coords are
>inclusive so i to i+2 = 3 bases
>		// do something with your trimer here
>	}
>
>Note that the index starts at 1 and goes right up to and including
>length(), as symbols in a SymbolList are 1-indexed, not 0-indexed.
>	
>cheers,
>Richard
>
>Richard Holland
>Bioinformatics Specialist
>GIS extension 8199
>---------------------------------------------
>This email is confidential and may be privileged. If you are not the
>intended recipient, please delete it and notify us immediately. Please
>do not copy or use it for any purpose, or disclose its content to any
>other person. Thank you.
>---------------------------------------------
>
>
>  
>
>>-----Original Message-----
>>From: biojava-l-bounces at portal.open-bio.org 
>>[mailto:biojava-l-bounces at portal.open-bio.org] On Behalf Of David Huen
>>Sent: Friday, December 16, 2005 7:34 AM
>>To: m.fortner at sbcglobal.net
>>Cc: biojava-list
>>Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x)
>>
>>
>>On Dec 15 2005, Mark Fortner wrote:
>>I think what you want is the SymbolListViews.orderNSymbolList method.
>>
>>It will take a SymbolList and turn it into another where it 
>>is viewed in 
>>another compound alphabet of the required order.
>>
>>
>>    
>>
>>>I'm looking for the best way to iterate through all
>>>nmers within a given sequence.  For example, given a
>>>sequence that looks like this:
>>>
>>>ACTGACTGACTG
>>>
>>>If I extract all trimers from this I should get:
>>>
>>>ACT
>>>CTG
>>>TGA
>>>GAC
>>>ACT
>>>CTG
>>>TGA
>>>GAC
>>>ACT
>>>CTG
>>>
>>>Is there an existing class that will allow me to
>>>iterate through a sequence this way, or am I on my
>>>own?
>>>
>>>      
>>>
>>_______________________________________________
>>Biojava-l mailing list  -  Biojava-l at biojava.org
>>http://biojava.org/mailman/listinfo/biojava-l
>>
>>    
>>
>
>  
>



More information about the Biojava-l mailing list