[Bioperl-l] Can't figure out restriction analysis
Marek
deeepersound at googlemail.com
Sat Jan 8 15:05:28 UTC 2011
Good idea!
Meanwhile I managed to figure out the cut method which yields coordinates, but I'm waiting now for an hour to see completion of the job for one of the larger chromosomes - that is slow. On the other hand this job will run only once.
I think I have emboss on one of my machines,I will test whether it might be faster (I guess so as you most likely did not run a such time consuming job to generate above output).
Maxim
Von meinem iPod gesendet
Am Jan 8, 2011 um 1:14 PM schrieb Hans-Rudolf Hotz <hrh at fmi.ch>:
> Hi Maxim
>
> It is not a Bioperl solution, but have you looked at the emboss tool 'restrict' to get the coordinates?
>
>
> -bash-3.2$ restrict chr1.fa -enzymes hindiii -sitelen 6 stdout
> Report restriction enzyme cleavage sites in a nucleotide sequence
> ########################################
> # Program: restrict
> # Rundate: Sat 8 Jan 2011 13:08:10
> # Commandline: restrict
> # [-sequence] /work/gbioinfo/DB/genomes/hg19/chr1.fa
> # -enzymes hindiii
> # -sitelen 6
> # [-outfile] stdout
> # Report_format: table
> # Report_file: stdout
> ########################################
>
> #=======================================
> #
> # Sequence: chr1 from: 1 to: 249250621
> # HitCount: 64394
> #
> # Minimum cuts per enzyme: 1
> # Maximum cuts per enzyme: 2000000000
> # Minimum length of recognition site: 6
> # Blunt ends allowed
> # Sticky ends allowed
> # DNA is linear
> # Ambiguities allowed
> #
> #=======================================
>
> Start End Strand Enzyme_name Restriction_site 5prime 3prime 5primerev 3primerev
> 16007 16012 + HindIII AAGCTT 16007 16011 . .
> 24571 24576 + HindIII AAGCTT 24571 24575 . .
> ///
> 249224235 249224240 + HindIII AAGCTT 249224235 249224239 . .
> 249230987 249230992 + HindIII AAGCTT 249230987 249230991 . .
> 249231350 249231355 + HindIII AAGCTT 249231350 249231354 . .
>
> #---------------------------------------
> #---------------------------------------
>
> #---------------------------------------
> # Total_sequences: 1
> # Total_length: 249250621
> # Reported_sequences: 1
> # Reported_hitcount: 64394
> #---------------------------------------
> -bash-3.2$
>
>
> for more details see:
> http://emboss.sourceforge.net/apps/cvs/emboss/apps/restrict.html
>
>
> Hope this helps.
> Regards, Hans
>
>
>
> On 01/07/2011 10:46 PM, Maxim wrote:
>> Hi,
>>
>> I'm desperately trying to annotate all HindIII restriction sites for
>> different genomes. The only solution I was able to figure out (after
>> complicated self-made approaches using awk and grep) is shown below. I'm not
>> a programmer, so please excuse the bad coding (strict, warnings etc, this is
>> dedicated to be a "standalone" script). Below code reads the 25 fasta files
>> for human genome one after another. Then restriction analysis is performed,
>> the only value I was able to get returned so far is the fragment length. Am
>> I right, that the fragment length values returned by the @fragments array
>> can be mapped onto the reference genome in a linear fashion, i.e. the
>> fragment length values are ordered according to their location on the
>> reference sequence? If so, I can of course get the positions (coordinates)
>> by simply adding the fragment length values.
>>
>> More questions:
>> I've seen the _make_cuts function, would this be more appropriate in order
>> to directly retrieve coordinates of cutting positions? Unfortunately I can't
>> get it work like
>> my $analysis = _make_cuts($seq->seq,'HindIII',0)
>> this returns:can't call method "_make_cuts" on an undefined value
>>
>> Or are both approaches just nonsense and I overlooked another obvious
>> function?
>> Maxim
>>
>>
>>
>> use Bio::Perl;
>> use Bio::SeqIO;
>> use Bio::Restriction::Analysis;
>> use Bio::Restriction::EnzymeCollection;
>> use Cwd;
>>
>> $dir = getcwd;
>>
>> $fasta_dir = "/data1/Genomes/HG18/";
>> opendir(DIR, $fasta_dir) or die "$!";
>> @chroms = grep {/chr/} readdir DIR;
>> print "found the following files:", join ("\n", map {$_} @chroms), "\n";
>> #foreach $chrom (@chroms) {print "$chrom\n";}
>>
>> foreach $file (@chroms)
>> {
>> $outname = $file;
>> @outname = split (/.fa/, $outname);
>> $outname = $dir . "/" . @outname[0] . "_fragments";
>> print "Output filename: $outname\n";
>> #exit;
>> open (OUT, ">$outname");
>> $filelink = $fasta_dir. "/" . $file;
>> my $seqio = Bio::SeqIO->new(-file => $filelink, '-format' => 'Fasta');
>> while(my $seq = $seqio->next_seq)
>> {
>> my $string = $seq->seq;
>> #print $string,"\n";
>>
>> my $analysis = Bio::Restriction::Analysis->new(-seq => $seq);
>> my @fragments = $analysis->fragments('HindIII');
>>
>> print OUT join("\n", map {length $_} @fragments), "\n";
>> }
>> close (OUT);
>> print "chromosome file: $file is done!\n";
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list