[Bioperl-l] Nucleotide analysis modules - bugzilla # 1422

Heikki Lehvaslaiho heikki at ebi.ac.uk
Tue Jul 1 15:52:48 EDT 2003


Rob,

I've started going through these modules starting from restriction
analysis.

I am beginning the accept Hilmar's view that restriction analysis is too
important functionality to bury that deep into Bio name space.
I think Bio::Restriction would be better and put the modules under it.

Bio::Restriction::Analysis
Bio::Restriction::Enzyme
Bio::Restriction::EnzymeCollection


My main concerns about the way your classes work is about how enzymes
are handled. The methods that return lists of enzymes return now enzyme
names. I think they should return EnzymeCollection objects. 

It would have the advantage that you can then easily build custom
EnzymeCollections and analyse the sequences using only those enzymes.
Also, you should be able to build Enzyme objects in memory and add them
into a collection.


Below is some sample code for creating the first collection
from rebase, and then creating a subset by calling methods to build a
new collection. Then one can add a few hand picked ones and finally
run analysis using those enzymes.

$rebase; # all from rebase

$six = $rebase->cutters(6);
$amershamsix = $six->company('Amersham'); # perl regexp
# same thing in one line
$amershamsix = $rebase->cutters(6)->company('Amersham');

# assuming HinvII and  HaeII are not included in amershamsix but
# that we want them into our list

$hand_picked = $rebase->enzymes(['HaeII','HinvII']);
# $hand_picked is yet an other EnzymeCollection
$one_more = $rebase->enzyme('AatI');
# $one_more is not EnzymeCollection but Enzyme

$amershamsix->add_enzymes($hand_picked, $one_more);

# the method loops over arguments and processes both Collections and
# Enzymes depending on the class. Add an Enzyme only if it is not in 
# the recipient collection already.

# methods to store your collection of enzymes needed ...
...


Similarly, I think the fragments that are returned by fragments method
should be Bio::SeqI objects with additional methods: start_enzyme() and
end_enzyme() that return an Enzyme object. 

my $re_analysis = new Bio::Restriction::Analysis;
$aati = $rebase->enzyme('AatI');
$two_enz = $rebase->enzymes(['HaeII','HinvII']);
@fragments = $re_analysis->fragments
    (-enzymes => $aati,  #
     -single => 1); # default
# or 

$re_analysis->enzymes($two_enz);
# double digest
@fragments = $re_analysis->fragments(-multi => 1);



The rest of you modules seem to fit under general concept of cloning. It
would be nice to try to develop the ideas in your Clone modulue further
and see what else can be done.

We could create Bio::Clone or even Bio::TestTube (! ;-) ) name space
and  add there modules to do with pcr and dna cloning experiments.
For example, there could be a Bio::Seq::Plasmid class which would know
about fragments it was put together and maybe someone will write a
plasmid drawing code to display it. 





Finally,  here are some notes about the documentation and code style:

- The code should really be indented by four spaces to make it  
  legible.

- These modules now becoming part of bioperl, I took away numerous 
  references into bioperl project

- The modules represent classes not objects (in POD Name)

- Synopsis need to be verbatim, e.i to keep the formatting and to
  make it runnable.

- Description should not be indented.

- All lines should ideally be wrapped < 80 characters. This is
  especially important for verbatim POD paragraphs, e.g. synopsis and
  method docs, which are not wrapped by pod formatters.

- try viewing files with  pod2text and pod2html. 

  -- I've wrapped some of the verbatim  lines but more need to
     be cleaned up.

- Comments for method docs need not be verbatim.

- When reading from a file in test use Bio::Root::IO to portable.

- We do not declare global $VERSION variable in modules, 
  it is defined for the whole bioperl in Bio::Root::Version.

- What is the global $ID for?



Sorry about the long mail,

	-Heikki




On Sat, 2003-06-21 at 16:52, Rob Edwards wrote:
> I have written a bunch of nucleotide analysis and manipulation modules that 
> I'd like someone to commit to the CVS for me.
> 
> The modules are:
>     * Bio/Seq/PrimedSeq.pm - a sequence object containing two primers
>     * Bio/SeqFeature/Primer.pm - a representation of a single primer
>     * Bio/Tools/Analysis/Nucleotide/Clone.pm - "clone" DNA and move sequence 
> features around
>     * Bio/Tools/Analysis/Nucleotide/PCRSimulation.pm - simulate PCR
>     * Bio/Tools/Analysis/Nucleotide/Primer3.pm - analyze primer3 output 
>     * Bio/Tools/Analysis/Nucleotide/Restriction/Enzyme.pm - a representation 
> of a single restriction enzyme
>     * Bio/Tools/Analysis/Nucleotide/Restriction/EnzymeCollection.pm - a 
> representation of a bunch of enzymes
>     * Bio/Tools/Analysis/Nucleotide/RestrictionAnalysis.pm - analyze 
> restriction sites in a DNA sequence
>     * Bio/Tools/Run/Primer3.pm - run the Primer3 program. 
> 
> Some of these are based on a lot of work by others, and I have enhanced them. 
> In particular Bio/Tools/Analysis/Nucleotide/RestrictionAnalysis.pm should 
> replace Bio/Tools/RestrictionEnzyme.pm. The new module contains all that 
> functionality and has been extended to deal properly with redundant sequences 
> and with Bio/Tools/Analysis/Nucleotide/Restriction/EnzymeCollection.pm rebase 
> files can be parsed.
> 
> I tried to write the pod to be compliant with the Pod Gods, and ran 
> maintenance/pod.pl -d Bio -v -b on these scripts - that should give me any 
> errors, right?
> 
> In addition to the modules and test scripts there are a couple of examples 
> that will let you run primer3 to design primers, cut DNA with restriction 
> enzymes, and perform PCR and clone genes.
> 
> Some of these modules address the bugs in bugzilla 1422: 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1422
> 
> The modules etc are available at 
> http://www.salmonella.org/bioperl/nucleotide_analyses.tgz
> 
> If someone could commit them I'd be grateful.
> 
> Thanks
> 
> Rob
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________



More information about the Bioperl-l mailing list