[Bioperl-l] extract nonoverlapping subsequences from a whole genome

gopu_36 gopu_36 at yahoo.com
Tue Apr 10 07:42:26 UTC 2007


Hi,
I am one of the newbee venturingout bioperl for my research purposes. I have
a whole genome sequence of a pathogen. I am trying to break them into
non-overlapping 1000bps subsequences. For example if my whole genome
sequence is 400000 bps length, then I should be beak them into 4000
subsequences of each 1000 bps and they should be non-overlapping but at the
same time continous. To be precise, my first substring would be from 1 to
1000 bps, second substing would be from 1001 to 2000 etcc.. Could anyone
help me. 
I tried with the following code but it gives me only the first substring and
rest are not! I would appreciate very much if someone could help me!
.........
.
.
my $start =1;
my $finish =100;
my $inseq  = Bio::SeqIO->new(-file => "$in_file");
while( my $seq = $inseq->next_seq ) {
	
	my $cleseq = $seq->seq();
	
	$seqlength = $seq->length();
	if ($finish<$seqlength){	
	print "The length of the sequence is $seqlength\n";	
	my $ordseq = $cleseq->subseq($start,$finish);
          push(@seq_array,$ordseq);
          $start=+100;
          $finish=+100;
          $counter++;
          next;          	             
       } 
}
-- 
View this message in context: http://www.nabble.com/extract-nonoverlapping-subsequences-from-a-whole-genome-tf3551560.html#a9915265
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.




More information about the Bioperl-l mailing list