[Bioperl-l] extract nonoverlapping subsequences from a whole genome

Tue Apr 10 20:10:35 UTC 2007

gopu_36 wrote:
> Hi,
> I am one of the newbee venturingout bioperl for my research purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences.
[snip]
> I tried with the following code but it gives me only the first substring and
> rest are not! I would appreciate very much if someone could help me!
[snip]
> my $start =1;
> my $finish =100;
> my $inseq  = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
> 	
> 	my $cleseq = $seq->seq();
> 	
> 	$seqlength = $seq->length();
> 	if ($finish<$seqlength){	
> 	print "The length of the sequence is $seqlength\n";	
> 	my $ordseq = $cleseq->subseq($start,$finish);
>           push(@seq_array,$ordseq);
>           $start=+100;
>           $finish=+100;
>           $counter++;
>           next;          	             
>        } 
> }

Unless I've misunderstood, there are a few problems here.

I'm guessing $in_file is a file containing the entire genome sequence as 
a single sequence. This means your while loop will only loop once. To do 
what you want you then need another loop that acts on the single $seq 
object you're going to get. You don't need $cleseq, and in fact your 
script ought to crash on the $cleseq->subseq line because $cleseq is a 
string which has no subseq() method. $seq->subseq is what you want.

I didn't look at the remaining code.

Hope that helps,
Sendu.