[Bioperl-l] Can't figure out restriction analysis
Hans-Rudolf Hotz
hrh at fmi.ch
Sat Jan 8 12:14:24 UTC 2011
Hi Maxim
It is not a Bioperl solution, but have you looked at the emboss tool
'restrict' to get the coordinates?
-bash-3.2$ restrict chr1.fa -enzymes hindiii -sitelen 6 stdout
Report restriction enzyme cleavage sites in a nucleotide sequence
########################################
# Program: restrict
# Rundate: Sat 8 Jan 2011 13:08:10
# Commandline: restrict
# [-sequence] /work/gbioinfo/DB/genomes/hg19/chr1.fa
# -enzymes hindiii
# -sitelen 6
# [-outfile] stdout
# Report_format: table
# Report_file: stdout
########################################
#=======================================
#
# Sequence: chr1 from: 1 to: 249250621
# HitCount: 64394
#
# Minimum cuts per enzyme: 1
# Maximum cuts per enzyme: 2000000000
# Minimum length of recognition site: 6
# Blunt ends allowed
# Sticky ends allowed
# DNA is linear
# Ambiguities allowed
#
#=======================================
Start End Strand Enzyme_name Restriction_site 5prime
3prime 5primerev 3primerev
16007 16012 + HindIII AAGCTT 16007
16011 . .
24571 24576 + HindIII AAGCTT 24571
24575 . .
///
249224235 249224240 + HindIII AAGCTT 249224235
249224239 . .
249230987 249230992 + HindIII AAGCTT 249230987
249230991 . .
249231350 249231355 + HindIII AAGCTT 249231350
249231354 . .
#---------------------------------------
#---------------------------------------
#---------------------------------------
# Total_sequences: 1
# Total_length: 249250621
# Reported_sequences: 1
# Reported_hitcount: 64394
#---------------------------------------
-bash-3.2$
for more details see:
http://emboss.sourceforge.net/apps/cvs/emboss/apps/restrict.html
Hope this helps.
Regards, Hans
On 01/07/2011 10:46 PM, Maxim wrote:
> Hi,
>
> I'm desperately trying to annotate all HindIII restriction sites for
> different genomes. The only solution I was able to figure out (after
> complicated self-made approaches using awk and grep) is shown below. I'm not
> a programmer, so please excuse the bad coding (strict, warnings etc, this is
> dedicated to be a "standalone" script). Below code reads the 25 fasta files
> for human genome one after another. Then restriction analysis is performed,
> the only value I was able to get returned so far is the fragment length. Am
> I right, that the fragment length values returned by the @fragments array
> can be mapped onto the reference genome in a linear fashion, i.e. the
> fragment length values are ordered according to their location on the
> reference sequence? If so, I can of course get the positions (coordinates)
> by simply adding the fragment length values.
>
> More questions:
> I've seen the _make_cuts function, would this be more appropriate in order
> to directly retrieve coordinates of cutting positions? Unfortunately I can't
> get it work like
> my $analysis = _make_cuts($seq->seq,'HindIII',0)
> this returns:can't call method "_make_cuts" on an undefined value
>
> Or are both approaches just nonsense and I overlooked another obvious
> function?
> Maxim
>
>
>
> use Bio::Perl;
> use Bio::SeqIO;
> use Bio::Restriction::Analysis;
> use Bio::Restriction::EnzymeCollection;
> use Cwd;
>
> $dir = getcwd;
>
> $fasta_dir = "/data1/Genomes/HG18/";
> opendir(DIR, $fasta_dir) or die "$!";
> @chroms = grep {/chr/} readdir DIR;
> print "found the following files:", join ("\n", map {$_} @chroms), "\n";
> #foreach $chrom (@chroms) {print "$chrom\n";}
>
> foreach $file (@chroms)
> {
> $outname = $file;
> @outname = split (/.fa/, $outname);
> $outname = $dir . "/" . @outname[0] . "_fragments";
> print "Output filename: $outname\n";
> #exit;
> open (OUT, ">$outname");
> $filelink = $fasta_dir. "/" . $file;
> my $seqio = Bio::SeqIO->new(-file => $filelink, '-format' => 'Fasta');
> while(my $seq = $seqio->next_seq)
> {
> my $string = $seq->seq;
> #print $string,"\n";
>
> my $analysis = Bio::Restriction::Analysis->new(-seq => $seq);
> my @fragments = $analysis->fragments('HindIII');
>
> print OUT join("\n", map {length $_} @fragments), "\n";
> }
> close (OUT);
> print "chromosome file: $file is done!\n";
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list