[Biopython-dev] Restriction analysis package.
fsms at users.sourceforge.net
fsms at users.sourceforge.net
Thu May 27 07:20:44 EDT 2004
Sorry, about the delay, Id to go to a job interview and had not easy
access to the net.
> Thanks for putting this together. The code looks very useful and I'd
> definitely like to see it work towards being included in Biopython,
> if that's what you'd like. A few comments on it:
> 1. First, if you'd like to include this in Biopython the code would
> have to be willing to license the code under the Biopython license.
> I see different references to the GPL and Python license within your
> package. I'm not at all the type of person who argues about
> licensing issues, but we just need to keep the Biopython
> distribution under one license.
Obviously, I will put the code under Biopython license. I put it under
Python for the time being knowing that some people don't like to read
> 2. The way this is organized right now puts two different types of
> functionality together -- building the enzyme dictionary by
> downloading and parsing Rebase, and the actual enzyme dictionary
> itself. For Biopython, the public functionality you'd want to expose
> would be the enzyme dictionary and the useful functions you have
> within that. The downloading and parsing work would be something
> that you, or another developer, would do on a monthly or whatever
> basis to keep the enzyme dictionary up to date within Biopython.
> Thus I'd propose organizing the code like:
> Bio/Restriction/__init__.py --> The current Restriction.py
> Bio/Restriction/Restriction_Dictionary.py --> the dictionary
> Bio/Restriction/_Update/ --> The Update, RanaConfig and
> RestrictionCompiler code to do the updates and regenerate the
yes and no. I agree with the organisation of the code and
I would effectively update the dictionary in Biopython but I think it is
important for the end user to be able to update the dictionary on their
without downloading the full distribution, so this is also a public
Something I would like to implement as well when I have the time is a
in the ranacompiler.py script to pre-select the supplier(s). If one
gets its enzymes from supplier A and B, they may wish not to search the
sequences for enzymes of supplier C.
> ranacompiler.py should exist in somewhere like Scripts/restriction
> to be run, instead of in site-packages.
Yes, the organisation of the package I submitted was quick and dirty to
assess if you were interested. I will write a setup.py for the package
which will allow the modification.
> 3. Going along with reorganizing the code base, I'd propose changing
> the updating scripts a bit. Storing databases and things into
> site-packages is generally not a good idea, since that is meant
> for Python code, and also requires the user to mess around with
> either running scripts as root or changing permissions -- more work
> then is really necessary. What I'd do is store the Database and
> Updates information into, say, the current directory where the user
> runs the scripts. Additionally, the Restriction_Dictionary.py would
> be generated there. Then, when the updates are done everything gets
> run and you have a new Restriction_Dictionary.py to copy over and
> check into CVS.
Well, as before I am also in favour of putting the databases, scripts and
so on in another directory than site-packages, this was a shortcut.
But I am not sure I understand what you propose.
The first point is who we want to do the update :
1 ) Should it be done in a centralised way, i.e. in Biopython, and
people get the
update when they update their CVS. Which means they use CVS for their
installation and that people getting Biopython from the release system
CVS will not get frequent updates of the enzyme dictionary.
This might not be a problem since Rebase does not change that quickly
for the most usual enzymes.
2) Another way is to propose an admin scheme. The administrator of the
box is in
charge of keeping the enzyme dictionary up to date. Then we must provide a
script to do that easily. We can then install all the data into a
directory something like /var/Biopython/Restriction/
In this case, the script would be run as root when the updates are done
the enzyme dictionary is installed into site-packages since it is a python
script after all.
3) The third way is to propose a scheme where everybody can make the update.
the directory in which everything is stored is then a
The enzyme dictionary is kept and run from the user home directory.
There is no problem
of permission, since the enzyme will be accessible to the user. Each
user will run its
personnal version of the enzyme dictionary that will be kept in its own
This means Restriction is installed centrally in
the enzyme dictionary is not installed when Biopython is installed. The
first time a new user
run the package, it get to update the dictionary.
The script Restriction_Dictionary.py is never installed into site-package.
4) The fourth solution is the current directory scheme. I am personnaly
keen on this one. My worry with this scheme is that, on machines
that are used by several persons, this will ultimately finish by
installing several times
the same information in different places. That could well be ok on *nix
which restrict what a user can do, but on windows...
This will end up into a mess and the scripts are more likely to break
if you have several installations of the enzyme dictionary.
Another solution here is to use temporary files.
My personnal preference would go to a mix of the first and second solution,
but I am open to discuss it further.
Does not Biopython have a centralised way to keep data centralised ?
Something like /var/Biopython. I am sure that this package is not the
which could benefit from such facility.
> Hopefully these make some sense. I really like the catalyse and
> search functionality on the enzyme classes -- it's a nice interface
> design and it would be great to have in Biopython.
> Please do let me know what you think about the licensing and change
> proposals and we can keep moving forward towards getting this in
> Biopython. Thanks again for the work so far!
I have had some time when I was away to test a bit further the
I have a class to add which allow analysis (i.e where you can specify
only blunt, or enzymes which cut twice...).
I will do the modif over the weekend. (i.e put it under biopython
license for a
start). The remaining will need a bit more time.
More information about the Biopython-dev