[Bioperl-l] extract nonoverlapping subsequences from a whole genome
Sendu Bala
bix at sendu.me.uk
Tue Apr 10 20:10:35 UTC 2007
gopu_36 wrote:
> Hi,
> I am one of the newbee venturingout bioperl for my research purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences.
[snip]
> I tried with the following code but it gives me only the first substring and
> rest are not! I would appreciate very much if someone could help me!
[snip]
> my $start =1;
> my $finish =100;
> my $inseq = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
>
> my $cleseq = $seq->seq();
>
> $seqlength = $seq->length();
> if ($finish<$seqlength){
> print "The length of the sequence is $seqlength\n";
> my $ordseq = $cleseq->subseq($start,$finish);
> push(@seq_array,$ordseq);
> $start=+100;
> $finish=+100;
> $counter++;
> next;
> }
> }
Unless I've misunderstood, there are a few problems here.
I'm guessing $in_file is a file containing the entire genome sequence as
a single sequence. This means your while loop will only loop once. To do
what you want you then need another loop that acts on the single $seq
object you're going to get. You don't need $cleseq, and in fact your
script ought to crash on the $cleseq->subseq line because $cleseq is a
string which has no subseq() method. $seq->subseq is what you want.
I didn't look at the remaining code.
Hope that helps,
Sendu.
More information about the Bioperl-l
mailing list