[Biopython-dev] test_GASelection hangs

Bruce Southey bsouthey at gmail.com
Mon Nov 17 20:03:54 UTC 2008


Peter wrote:
> On Mon, Nov 17, 2008 at 6:35 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>   
>> Hi,
>> I was just running the test under a very fresh cvs version and under
>> Python2.3 the test was hanging with test_GASelection. Of course, there was
>> no problem after killing it and rerunning the test. I think this also
>> pertains to bug 2651 so I thought I would ask if there was a way to examine
>> this further before doing anything else.  I understand that this is problem
>> with randomization involved, but it does indicate a more subtle problem is
>> present.  I would really like to track down the source of the problem.
>>
>> Does anyone have any ideas on how I could try to examine this further?
>>     
>
> If you have installed CVS (or indeed any recent version of Biopython,
> as the GA stuff hasn't changed recently IIRC), then in the Tests
> directory you can just run:
>
> $ python test_GASelection.py
>
> You'll find sometimes it gets stuck.  I tried modifying the file so
> that the end reads as follows:
>
> if __name__ == "__main__":
>     #sys.exit(run_tests(sys.argv))
>
>     ALL_TESTS = [DiversitySelectionTest, TournamentSelectionTest,
>                  RouletteWheelSelectionTest]
>
>     runner = unittest.TextTestRunner(sys.stdout, verbosity = 2)
>     test_loader = unittest.TestLoader()
>     test_loader.testMethodPrefix = 't_'
>
>     test=ALL_TESTS[1] #Edit me: 0, 1 or 2
>     cur_suite = test_loader.loadTestsFromTestCase(test)
>     count = 0
>     while True :
>         count += 1
>         print "#"*50, count
>         runner.run(cur_suite)
>
> On my machine, DiversitySelectionTest and RouletteWheelSelectionTest
> seem safe - the tests just run and run until you interrupt them with
> ctrl+c.
>
> However, this clearly gets stuck in TournamentSelectionTest - so we've
> narrowed this down a bit.  Reading that bit of code, there is an
> apparent risk of an infinite loop if by chance org_1 happens to be the
> worst organism in the population.  Perhaps adding a simple counter to
> break out of the loop if after 1000 tries org_1 is still the worst -
> but I'm not sure what to do then.
>
> Peter
>
>   
Hi,
I ran the test multiple times using a bash loop and I think I tracked 
down this specific problem to within the actual test code, specifically 
the function TournamentSelectionTest.t_select_best(). I think this what 
Peter noticed.

This is how I understand things which I hope is sufficient correct to 
understand it.

The test simulates a genome that has 3 locations with the 4 bases coded 
as '0', '1', '2', and '3' for an 'organism'.  (Note the 3 locations is 
hard coded into the random_genome function.) The calculation of fitness 
of an organism is just the integer of the coded values do the first 
position is hundreds, the second is tens and last is ones.

In the TournamentSelectionTest.t_select_best, a second organism is 
simulated that must have a better fitness than the first. The problem 
comes is when the simulated genome of the first organism is '000' 
because the fitness is zero. This creates an infinite loop because the 
line :
            if org_2.fitness < org_1.fitness:
will always to false but eventually this must be true to break the loop. 
Obviously this loop becomes infinite and, given that there are only 
three locations, it should be rather frequent.

Is it sufficient to use the condition '<='?
Alternatively, is there someway to fix the genome of the first organism 
rather than a random one?
For example, instead of the random_organism() declare it as say:
org_1=Organism('100', test_fitness)


Bruce




More information about the Biopython-dev mailing list