[Biojava-l] Sequence Iteration in BioJava(x)

mark.schreiber at novartis.com mark.schreiber at novartis.com
Thu Dec 15 21:33:48 EST 2005


Actually orderNSymbolList gives overlapping NMers. windowedSymbolList 
gives non-overlapping Nmers.

given the sequence 

actcgcatgcgatcgcag


orderNSymbolList (with order of 4) would give

actc, ctcg, tcgc etc

windowedSymbolList with an order of 4 would give

actc, gcat, gcga, etc

eventually the windowedSymbolList woud actually throw an exception cause 
the sequence above is not evenly divisible by 4 (seq.length() % 4 != 0)

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910





"Richard HOLLAND" <hollandr at gis.a-star.edu.sg>
Sent by: biojava-l-bounces at portal.open-bio.org
12/16/2005 09:43 AM

 
        To:     "David Huen" <smh1008 at cam.ac.uk>, <m.fortner at sbcglobal.net>
        cc:     biojava-list <biojava-l at biojava.org>, (bcc: Mark Schreiber/GP/Novartis)
        Subject:        RE: [Biojava-l] Sequence Iteration in BioJava(x)


orderNSymbolList splits the sequence into non-overlapping chunks. What
is required here is chunks that are only one base different (further on)
than the previous chunk.

The simplest way would be this:

                 SymbolList mySeq; // this is your sequence from somewhere 
else
                 for (int i = 1 ; i <= mySeq.length()-2; i++) {
                                 SymbolList trimer = mySeq.subSeq(i,i+2); 
// coords are
inclusive so i to i+2 = 3 bases
                                 // do something with your trimer here
                 }

Note that the index starts at 1 and goes right up to and including
length(), as symbols in a SymbolList are 1-indexed, not 0-indexed.
 
cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biojava-l-bounces at portal.open-bio.org 
> [mailto:biojava-l-bounces at portal.open-bio.org] On Behalf Of David Huen
> Sent: Friday, December 16, 2005 7:34 AM
> To: m.fortner at sbcglobal.net
> Cc: biojava-list
> Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x)
> 
> 
> On Dec 15 2005, Mark Fortner wrote:
> I think what you want is the SymbolListViews.orderNSymbolList method.
> 
> It will take a SymbolList and turn it into another where it 
> is viewed in 
> another compound alphabet of the required order.
> 
> 
> >I'm looking for the best way to iterate through all
> >nmers within a given sequence.  For example, given a
> >sequence that looks like this:
> >
> >ACTGACTGACTG
> >
> >If I extract all trimers from this I should get:
> >
> >ACT
> >CTG
> >TGA
> >GAC
> >ACT
> >CTG
> >TGA
> >GAC
> >ACT
> >CTG
> >
> >Is there an existing class that will allow me to
> >iterate through a sequence this way, or am I on my
> >own?
> >
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l





More information about the Biojava-l mailing list