[Bioperl-l] Select random sequences from a fasta file
Laurent MANCHON
lmanchon at univ-montp2.fr
Thu Mar 22 09:03:39 UTC 2012
Le 21/03/2012 20:42, shalabh sharma a écrit :
> Hi All,
> Is there a way to select random sequences from a multi fasta
> file. I am using some method (not that sophisticated).
> Is there any module in bioperl that can do that?
>
> I have a fasta file containing around 10 million reads, and i want to get
> few thousand sequences out of it (randomly selected).
>
> Thanks
> Shalabh
>
--Hello,
i have a piece of code to randomly pick up lines from a file,
maybe you can adapt this code to your problem:
#!/usr/bin/perl
# pick random lines from a file
use strict;
use warnings;
use List::Util qw(shuffle);
my $GET_LINES = 10000;
my @line_starts;
open( my $fh, '<', 'big_text_file.txt' )
or die "Oh, fudge: $!\n";
do {
push @line_starts, tell $fh
} while ( <$fh> );
my $count = @line_starts;
print "Got $count lines\n";
my @shuffled_starts = (shuffle @line_starts)[0..$GET_LINES-1];
for my $start ( @shuffled_starts ) {
seek $fh, $start, 0
or die "Unable to seek to line - $!\n";
print scalar <$fh>;
}
Regards,
Laurent --
More information about the Bioperl-l
mailing list