[Bioperl-l] RestrictionEnzyme.pm

Hilmar Lapp hlapp@gmx.net
Thu, 25 Jan 2001 10:38:59 -0800


To bring it to the right audience. Please note that despite some
module PODs still saying that bioperl-guts-l is for technical
discussion, it is not in fact. The guts-list is for CVS messages
and similar stuff most people never want to hear about.

	Hilmar

-------- Original Message --------
Subject: RE: [Bioperl-guts-l] RestrictionEnzyme.pm
Date: Wed, 24 Jan 2001 14:00:30 +0100
From: "Paul-Christophe Varoutas"
<paul-christophe.varoutas@curie.fr>
To: <bioperl-guts-l@bioperl.org>



You *are* right about not writting to the Bio/Tools directory, I
guess I was
rather sleepy when I wrote my previous mail %-P. And using my
Win2000 and
Linux as root all time doesn't arrange things either ;-).

It is a good idea to incorporate the RE list update in the object
constructor, and combining Hilmar's options (1) and (2) seems
great because
it's a flexible solution and should suite most needs. For the URL
retrieval,
I guess http will be more suitable, I will contact NEBASE to be
sure that
the URL we will decide to mostly use is to remain stable.

This solution raises a small question: that of multiple
occurences. The fact
that we are using hashes will take care of eliminating multiple
occurences
of enzymes (one from the hard-coded collection, one from the the
file /
URL). Perhaps a minor issue would be to decide whether we just
"let perl do
the work" or if we do verifications while replacements are done,
and/or
define how they are done. We can make the assumption that, say,
AatII always
has the same recognition site, but if I make a issue out of this
is because
I don't know yet how this module is being used, and especially if
it is only
used for what it has initially been designed for. Do you know if
there are
users out there using this module in an unorthodox way, defining
enzyme
names/recognition sequences that don't exist, but could risk to
create
conflicts/unusual behavior ?

Another issue is enzymes cutting asymetrically. For the moment the
other
RestrictionEnzyme methods don't know how to deal with them (as far
as I
understood), so the code will just ignore them while parsing the
RE list
file.

One remaining question is about the RE list file format: is the
DNAStrider
format OK for everybody, or is there another suggestion ? An
alternative
would be to contact NEBASE and ask them to add a new 'bioperl'
format to
their database, and then define a format that minimizes parsing
and suits
best our needs. On their web site they say:

"As REBASE expands, new data formats are provided. Requests for
specialized
formats are welcome, as we are prepared to support each major
sequence
analysis package".

(The URL is: http://rebase.neb.com/rebase/rebase.serv.html  )

So what do you think about this idea ?



> Do you already have a CVS write account?

I have already successfully anonymously CVSed from my home PC
(under Win2000
and linux), but I don't have a write account yet. I will contact
Ewan /
Chris about that.


Paul-Christophe







> You normally can't write to Bio/Tools as a user (under Unix), and
> a user client shouldn't attempt to do so under any circumstances.
> Regarding the ability to update the list of known REs, I see the
> following options.
> 1) Accept an additional (named!) parameter at initialization that
> denotes a file (in DNAStrider format?) containing the enzymes to
> be known in addition to a collection of hard-coded enzymes.
> 2) Same as before, but the parameter denotes a URL from where to
> obtain this file.
> 3) Put all hard-coded enzymes into a file that resides at a known
> place within the Bio/ directory tree, and read (parse) that upon
> initialization of RestrictionEnzyme.pm. An update would mean
> updating that file.
>
> I'm not sure option 3) would have compelling advantages to the
> present layout. Options 1) and 2) are certainly worthwhile to
> pursue and in essence are almost identical, the only difference
> being how to open the stream containing the enzyme data. So, one
> could try to combine both into one parameter, and have the code
> figure out whether it's a file or a http/ftp URL.
>
> 	Hilmar
>
> Do you already have a CVS write account?
>
> > - if the enzyme list is saved in a separate file, I will also modify the
> > initialisation of the %RE hash, with code that reads and parses
> the enzyme
> > list file.
> >
> > If this sounds OK to you, I will write it this weekend and submit it. Of
> > course if you had something completely different in mind please
> say it, I
> > will try to adapt to it.
> >
> > Paul-Christophe
> >
>
> --
> -----------------------------------------------------------------
> Hilmar Lapp                                email: hlapp@gmx.net
> GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
> -----------------------------------------------------------------
>

_______________________________________________
Bioperl-guts-l mailing list
Bioperl-guts-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-guts-l