[Bioperl-l] Split genomes

Brian Osborne bosborne11 at verizon.net
Wed Jun 22 18:13:40 UTC 2011


Anna,

Start with this:

http://www.bioperl.org/wiki/HOWTO:Beginners

The script you showed had no Bioperl in it, so it was too long and too complicated. Use Bioperl, it's less work.

BIO



On Jun 22, 2011, at 1:39 PM, anna_sh wrote:

> 
> Thanks for the script. It was just the perfect fix.
> 
> 
> 
> anna_sh wrote:
>> 
>> Hey,
>> 
>> I want to split mitochondrial genome of certain mammals into ~250MB length
>> of fragments which i further use for running RepeatMasker on them. I have
>> posted below the script I use for the purpose, the issue is that it does
>> not split the genomes in the size i require. Can anybody please modify my
>> script so that it splits the genome in fragments of 200-275MB. Thanks a
>> lot !
>> 
>> 
>> 
>> 
>> 
>> Script (minus the shebang line) -->
>> 
>> 
>> # USE: file_split.fasta.pl fastafile
>> # USE: Splits a given fasta file into smaller files
>> # USE: -n splits into files of n numbered sequences
>> # USE: -f splits into n numbered files
>> # USE: -l splits into files of a total of l length
>> 
>> use strict;
>> use Getopt::Long;
>> 
>> my ($nseq, $nfiles, $seql);
>> 
>> GetOptions( "n:i" => \$nseq,
>> 	    "f:i" => \$nfiles,
>> 	    "l:i" => \$seql);
>> 
>> my $usage = "file_split.fasta.pl -n -f fasta file\n";
>> 	    
>> if (!($ARGV[0])) { die "$usage";} #checks to make sure that some form of
>> split variable has been given
>> 	    
>> my $infile = shift or die;
>> 
>> chomp $infile;
>> 
>> 
>> if ($infile) {
>> 	open (INITIAL, $infile) or die "cannot open $infile\n";
>> 		}  #take initial_input line 1
>> 		
>> my $fnum = 0;
>> 		
>> if ($nseq && !($nfiles) && !($seql)){
>> 		my $seqcount = 0;	
>> 		my $outfile = $infile.".".$fnum;
>> 		open (OUTFILE, ">$outfile");
>> 		
>> 		while(<INITIAL>) {
>> 				if (m/^>/) {$seqcount++;}
>> 				if ($seqcount > $nseq)
>> 						{	$seqcount = 1;
>> 							$fnum++;
>> 							close OUTFILE;
>> 							$outfile = $infile.".".$fnum;
>> 							open (OUTFILE, ">$outfile");
>> 				}
>> 				print OUTFILE "$_";
>> 			}
>> 		close OUTFILE;
>> 		close INITIAL;		
>> } #ends split on n-sequences loop
>> 	
>> if ($nfiles && !($nseq) && !($seql)){
>> 	my $totalseqs = totalcount($infile);
>> 	my $numseqs = int ($totalseqs/$nfiles);
>> 	my $outfile = $infile.".".$fnum;
>> 	my $seqcount = 0;	
>> 	open (OUTFILE, ">$outfile");	
>> 	
>> 	while(<INITIAL>) {
>> 			if (m/^>/) {$seqcount++;}
>> 			if (($seqcount > $numseqs) && ($fnum < ($nfiles-1)))
>> 					{	$seqcount = 1;
>> 						$fnum++;
>> 						close OUTFILE;
>> 						$outfile = $infile.".".$fnum;
>> 						open (OUTFILE, ">$outfile");
>> 				}
>> 				print OUTFILE "$_";
>> 			}
>> 		close OUTFILE;
>> 		close INITIAL;
>> }  #ends split into $nfile files loop
>> 
>> if ($seql && !($nfiles) && !($nseq)){
>> 	my $totallength = getlength($infile);
>> 	my $filetotal = int ($totallength/$seql);	
>> 	
>> 	my $outfile = $infile.".".$fnum;
>> 	open (OUTFILE, ">$outfile");	
>> 	my $sumlength = 0;
>> 	
>> 	while(<INITIAL>) {
>> 
>> 			if (!(m/^>/)) {$sumlength += length ($_) }
>> 			if (($sumlength > $seql) && (m/^>/))
>> 					{	$sumlength = 0;
>> 						$fnum++;
>> 						close OUTFILE;
>> 						$outfile = $infile.".".$fnum;
>> 						open (OUTFILE, ">$outfile");
>> 				}
>> 				print OUTFILE "$_";
>> 			}
>> 		close OUTFILE;
>> 		close INITIAL;
>> }
>> $fnum++;
>> print "Split $infile into $fnum files\n";	
>> 
>> sub totalcount {
>> 	open (FILE, "$_[0]") or die "cannot find $_[0]\n";
>> 	my $seqnum = 0;
>> 	while (<FILE>){
>> 		if (m/^>/) {$seqnum++;}
>> 	}
>> 	close FILE;
>> 	return $seqnum;
>> } #returns number of sequences in the file
>> 
>> sub getlength{
>> 	open (FILE, "$_[0]") or die "cannot find $_[0]\n";
>> 	my $totalbp = 0;
>> 	while (<FILE>){
>> 		if (!(m/^>/)){ $totalbp += length ($_);}
>> 	}
>> 	print "Total bp = $totalbp\n";
>> 	close FILE;
>> 	return $totalbp;	
>> } #returns the total basepairs in the file.
>> 
> 
> -- 
> View this message in context: http://old.nabble.com/Split-genomes-tp31886043p31905239.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list