[Bioperl-l] Can't figure out restriction analysis

Sat Jan 8 12:14:24 UTC 2011

Hi Maxim

It is not a Bioperl solution, but have you looked at the emboss tool 
'restrict' to get the coordinates?

-bash-3.2$ restrict chr1.fa -enzymes hindiii -sitelen 6 stdout
Report restriction enzyme cleavage sites in a nucleotide sequence
########################################
# Program: restrict
# Rundate: Sat  8 Jan 2011 13:08:10
# Commandline: restrict
#    [-sequence] /work/gbioinfo/DB/genomes/hg19/chr1.fa
#    -enzymes hindiii
#    -sitelen 6
#    [-outfile] stdout
# Report_format: table
# Report_file: stdout
########################################

#=======================================
#
# Sequence: chr1     from: 1   to: 249250621
# HitCount: 64394
#
# Minimum cuts per enzyme: 1
# Maximum cuts per enzyme: 2000000000
# Minimum length of recognition site: 6
# Blunt ends allowed
# Sticky ends allowed
# DNA is linear
# Ambiguities allowed
#
#=======================================

   Start     End  Strand Enzyme_name Restriction_site    5prime 
3prime 5primerev 3primerev
   16007   16012       + HindIII     AAGCTT               16007 
16011         .         .
   24571   24576       + HindIII     AAGCTT               24571 
24575         .         .
///
249224235 249224240       + HindIII     AAGCTT           249224235 
249224239         .         .
249230987 249230992       + HindIII     AAGCTT           249230987 
249230991         .         .
249231350 249231355       + HindIII     AAGCTT           249231350 
249231354         .         .

#---------------------------------------
#---------------------------------------

#---------------------------------------
# Total_sequences: 1
# Total_length: 249250621
# Reported_sequences: 1
# Reported_hitcount: 64394
#---------------------------------------
-bash-3.2$

for more details see:
http://emboss.sourceforge.net/apps/cvs/emboss/apps/restrict.html

Hope this helps.
Regards, Hans

On 01/07/2011 10:46 PM, Maxim wrote:
> Hi,
>
> I'm desperately trying to annotate all HindIII restriction sites for
> different genomes. The only solution I was able to figure out (after
> complicated self-made approaches using awk and grep) is shown below. I'm not
> a programmer, so please excuse the bad coding (strict, warnings etc, this is
> dedicated to be a "standalone" script). Below code reads the 25 fasta files
> for human genome one after another. Then restriction analysis is performed,
> the only value I was able to get returned so far is the fragment length. Am
> I right, that the fragment length values returned by the @fragments array
> can be mapped onto the reference genome in a linear fashion, i.e. the
> fragment length values are ordered according to their location on the
> reference sequence? If so, I can of course get the positions (coordinates)
> by simply adding the fragment length values.
>
> More questions:
> I've seen the _make_cuts function, would this be more appropriate in order
> to directly retrieve coordinates of cutting positions? Unfortunately I can't
> get it work like
> my $analysis = _make_cuts($seq->seq,'HindIII',0)
> this returns:can't call method "_make_cuts" on an undefined value
>
> Or are both approaches just nonsense and I overlooked another obvious
> function?
> Maxim
>
>
>
> use Bio::Perl;
> use Bio::SeqIO;
> use Bio::Restriction::Analysis;
> use Bio::Restriction::EnzymeCollection;
> use Cwd;
>
> $dir = getcwd;
>
> $fasta_dir = "/data1/Genomes/HG18/";
> opendir(DIR, $fasta_dir) or die "$!";
> @chroms =  grep {/chr/}  readdir DIR;
> print "found the following files:", join ("\n",  map {$_} @chroms), "\n";
> #foreach $chrom (@chroms) {print "$chrom\n";}
>
> foreach $file (@chroms)
> {
> $outname = $file;
> @outname = split (/.fa/, $outname);
> $outname = $dir . "/" . @outname[0] . "_fragments";
> print "Output filename: $outname\n";
> #exit;
> open (OUT, ">$outname");
> $filelink = $fasta_dir. "/" . $file;
> my $seqio = Bio::SeqIO->new(-file =>  $filelink, '-format' =>  'Fasta');
> while(my $seq = $seqio->next_seq)
> {
>      my $string = $seq->seq;
>      #print $string,"\n";
>
>      my $analysis = Bio::Restriction::Analysis->new(-seq =>  $seq);
>      my  @fragments =  $analysis->fragments('HindIII');
>
>      print OUT join("\n", map {length $_} @fragments), "\n";
> }
> close (OUT);
> print "chromosome file: $file is done!\n";
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l