[Bioperl-l] extract nonoverlapping subsequences from a whole genome
Chris Fields
cjfields at uiuc.edu
Tue Apr 10 20:57:20 UTC 2007
Okay, I was bored! This is a little shorter than that script:
my $seqin = Bio::SeqIO->new(-format => 'fasta',
-file => shift);
my $seqout = Bio::SeqIO->new(-format => 'fasta',
-file => '>split.fas');
while( my $seq = $seqin->next_seq ) {
my $seqlength = $seq->length();
print STDERR "Length is $seqlength\n";
my $start = 1;
my $end = 100;
my $desc = $seq->description;
CHUNK:
while ($end <= $seqlength){
my $ordseq = $seq->trunc($start,$end);
$ordseq->description("$start-$end $desc");
$seqout->write_seq($ordseq);
last CHUNK if $end >= $seqlength;
$start += 100;
$end = ($end + 100 > $seqlength) ? $seqlength : $end + 100;
}
}
chris
On Apr 10, 2007, at 2:42 AM, gopu_36 wrote:
>
> Hi,
> I am one of the newbee venturingout bioperl for my research
> purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences. For example if my whole genome
> sequence is 400000 bps length, then I should be beak them into 4000
> subsequences of each 1000 bps and they should be non-overlapping
> but at the
> same time continous. To be precise, my first substring would be
> from 1 to
> 1000 bps, second substing would be from 1001 to 2000 etcc.. Could
> anyone
> help me.
> I tried with the following code but it gives me only the first
> substring and
> rest are not! I would appreciate very much if someone could help me!
> .........
> .
> .
> my $start =1;
> my $finish =100;
> my $inseq = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
>
> my $cleseq = $seq->seq();
>
> $seqlength = $seq->length();
> if ($finish<$seqlength){
> print "The length of the sequence is $seqlength\n";
> my $ordseq = $cleseq->subseq($start,$finish);
> push(@seq_array,$ordseq);
> $start=+100;
> $finish=+100;
> $counter++;
> next;
> }
> }
> --
> View this message in context: http://www.nabble.com/extract-
> nonoverlapping-subsequences-from-a-whole-genome-
> tf3551560.html#a9915265
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list