[Bioperl-l] extract nonoverlapping subsequences from a whole genome

Tue Apr 10 20:22:15 UTC 2007

There is a script in the BioPerl scripts directory which does this,  
with optional overlaps (split_seq.PLS).  It's in /scripts/seq.

chris

On Apr 10, 2007, at 2:42 AM, gopu_36 wrote:

>
> Hi,
> I am one of the newbee venturingout bioperl for my research  
> purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences. For example if my whole genome
> sequence is 400000 bps length, then I should be beak them into 4000
> subsequences of each 1000 bps and they should be non-overlapping  
> but at the
> same time continous. To be precise, my first substring would be  
> from 1 to
> 1000 bps, second substing would be from 1001 to 2000 etcc.. Could  
> anyone
> help me.
> I tried with the following code but it gives me only the first  
> substring and
> rest are not! I would appreciate very much if someone could help me!
> .........
> .
> .
> my $start =1;
> my $finish =100;
> my $inseq  = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
> 	
> 	my $cleseq = $seq->seq();
> 	
> 	$seqlength = $seq->length();
> 	if ($finish<$seqlength){	
> 	print "The length of the sequence is $seqlength\n";	
> 	my $ordseq = $cleseq->subseq($start,$finish);
>           push(@seq_array,$ordseq);
>           $start=+100;
>           $finish=+100;
>           $counter++;
>           next;          	
>        }
> }
> -- 
> View this message in context: http://www.nabble.com/extract- 
> nonoverlapping-subsequences-from-a-whole-genome- 
> tf3551560.html#a9915265
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign