[Biopython-dev] 'testseq' function update #3

Adil Iqbal aiqbal85 at gmail.com
Thu Jun 1 21:18:07 UTC 2017


Hello again!

I've detailed the changes below. You can view updated version of my code
here:
https://github.com/Adil-Iqbal/Personal-Projects/blob/master/Test%20Sequence/testseq.py

I believe I have found a satisfactory solution based on Andrew's suggestion.

Before I talk about the solution, I'd like to quickly recap what I
discussed in an earlier email; once the global instance of the
random.Random class is seeded, it cannot be reverted to its original
behavior (which is to re-seed itself with every function call). Instead
what happens is that the Random class seeds based on the system date/time
-- which means it can only generate one new sequence ever few milliseconds.
That is undesirable since I would like the function to be able to produce a
unique sequence with every function call if desired by the user. In cases
involving for-loops, the function would fail if seeded using the system
date/time.

I tried using a global variable named "shuffle_seed" to seed the RNG, which
I would incremented with every call, but that would require biopython to
use memory to track how many times the user ran the function. That was not
ideal since the user's code should be allowed to proceed as independently
as possible.

I then tried to implement Andrew's most recent suggestion verbatim. Which
was to instantiate the random.Random class outside of the function
definition and seed it within the function only if the seed was declared as
an argument. The benefit of this method was that I was only creating one
instance of the Random class and not having to track the user's function
calls. Unfortunately, once the global instance was seeded, it began to
generate sequences based on date/time again.

The solution that works well is that I created 2 global instances of the
random class called "seeded_instance" and "anchor_instance." The seeded
instance will be seeded with the "rand_seed" argument every function call.
If a seed is not declared, the anchor_instance will be invoked to assign a
random value to "rand_seed." Since anchor_instance is never seeded, it
retains the original desired behavior that I was after.

This method is efficient, in that only two instances of the Random class
are required and generated upon loading the module. It's also entirely
independent of the users code AND the rest of biopython, since everything
is instantiated. Most importantly, it works perfectly. You may not be able
to see it on my github code, but I have been running doctests on my local
copy of the code and everything checks out.

I'm open to any other suggestions. Are there any other things that I should
do?  Would this code be ready for a pull request?

​Best,
Adil Iqbal​
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20170601/a2b3a7c8/attachment.html>


More information about the Biopython-dev mailing list