[Bioperl-l] Re: [Bioperl-guts-l] Bio::Restriction modifications
Rob Edwards
redwards at utmem.edu
Wed Dec 3 08:16:13 EST 2003
Peter,
It sounds like we used the same approach to fix the same problems -
great minds :)
However, we both fixed extra things and so we need to merge.
Here
> Bio::Restriction::Analysis
>
> Changed the site finding algorithm in '_new_cuts'. The new
> algorithm
> uses pos to find the recognition site regex and finds the actual cut
> point
> using $enz->cut. The cut positions are put in an array and substr is
> used
> to get digestion fragments.
This is the best way to do it, it allows you to do a lot more with the
sequences. I also used this approach for ambiguous enzymes. For
non-ambiguous enzymes it is a lot more efficient to use an index based
string search rather than a regexp, and I added an internal method to
deal with this.
> Fixed the origin bug for circular target sequences. If the sequence
> is
> circular the first 40 bases are appended to the end of the sequence.
> Sites
> that span the origin are now detected. First and last fragments are
> reattached as before.
I added a method to EnzymeCollection to return the longest cut site in
the collection and then added that length of sequence. This will
correctly deal with any site > 40 bp. Using position instead of
fragments sure makes this a lot easier, huh?
> Fixed the non-palindromes bug. Non-palindromic sites are detected
> in
> the reverse orientation by searching the reverse complement of the
> target
> sequence, using $enz->complementary_cut and then subtracting from the
> sequence length.
I added a new method to Enzyme.pm to return the complementary cut site.
I don't particularly like the way I did this. I'd like Analysis to
correctly use the Enzyme->cut and Enzyme->complementary_cut which it
doesn't at the moment. I think this is a lot better than using the "^"
in the site routine.
> Fixed a bug where a site would be detected even if the
> complementary cut
> was off the end of the target sequence.
I am not sure if I deal with this properly. I think it should be
handled properly with circularization?
> Added a method 'fragment_maps($enzyme_name)'. This method returns an
> array of hashes where each hash corresponds to a restriction fragment.
> The
> hash keys are 'start', 'end' and 'seq'. 'seq' holds the sequence of the
> fragment, while 'start' and 'end' hold the positions of the first and
> last
> bases of the fragment in the target sequence. This will allow
> restriction
> fragments to be turned into features.
This is cool.
> Added internal methods '_find_cut', '_multicut' and ' _digest'.
> '_find_cut' figures out the cut point given the position of the
> recognition
> site and checks whether the cut point is off the end of the molecule.
> This
> has to be done twice for non-palindromic enzymes, so it makes sense to
> make
> it a separate method.
> '_digest' gets the restriction fragments from the list of cut
> points. I
> split this off of '_new_cuts' for the sake of readability, particularly
> after we start dealing with overlapping sites.
> '_multicut' checks whether the "other" cut of a dual cut enzyme is
> off
> the end of the sequence. Could probably be folded back into
> '_find_cut'.
I did something similar, but probably used about 5x as many methods
(making it much less clear I am sure).
> Bio::Restriction::Enzyme
>
> Changed the enzyme name correction code so only a single trailing
> '1'
> gets changed to an 'I'. This should prevent name mangling of enzymes
> like
> Alw21I.
>
> Changed '$cut && $self->cut($cut);' to 'defined $cut &&
> $self->cut($cut);' (same for complementary cut) in the constructor.
> This
> allows $self->cut to be zero.
>
> Some minor changes in methods 'cut' and 'site'.
These should be committed.
> Bio::Restriction::IO::base
>
> Changed method '_make_multicuts' so that the second enzyme is not
> added
> to the collection and the first enzyme is not added to the second
> enzyme's
> "others" array, only the other way around. The idea is to prevent dual
> cut
> sites from being reported twice. This will break scripts that used this
> behavior to work around the dual cut bug, but you shouldn't have to do
> that
> any more.
>
> Similar changes made to '_make_multisites'.
These should be committed too.
I'll try and merge this code with what I wrote and add it to cvs in a
day or so. Once I get a near working version, I'll let you know,
perhaps you can help debug the changes.
Take a look at the latest versions in cvs and let me know what you
think.
Rob
More information about the Bioperl-l
mailing list