[Bioperl-l] nested joins

Hilmar Lapp hlapp at gnf.org
Sat Apr 23 21:41:31 EDT 2005


If I understand things somewhat correctly, then the following regexp is 
used to deal with nested joins (bug#1674):

	    $re = qr{
              \(
              (?:
                 (?> [^()]+ )    # Non-parens without backtracking
               |
                 (??{ $re })     # Group with matching parens
              )*
              \)
             }x;

This uses 2 advanced perlre features, which, despite being perfectly 
well documented in perl 5.6.0 behaves (matches) differently between 
perl 5.6.0 and later versions. (The irony seems to be that the 
expression itself appears verbatim in perlre as an example - already in 
5.6.0!)

I have tested 5.6.1 on linux and the expression matches correctly 
there. Maybe this is also a platform issue, but I don't have any other 
platform than Mac OSX 10.2 that still uses 5.6.0.

I've included a scriptlet at the end with which people can test on 
their platform.

This difference in behaviour is most likely the reason why the 
LocationFactory test fails on 5.6.0 but succeeds on later versions of 
perl.

There's a couple of options we have:

	a) Require perl 5.6.1 in the Makefile.PL, and abandon support for 
5.6.0.
	b) Remove support for nested joins in location strings.
	c) Branch in the respective piece of code depending on perl version 
and don't use the regex construct above if perl version is 5.6.0 or 
less, with the understanding that nested joins are not supported in 
perl 5.6.0.

(BTW this is not supported at all in versions 5.005 and lower, so the 
requiring 5.005 in Makefile.PL should certainly be revised.)

I'm a bit ambivalent on this as nested joins shouldn't really exist and 
unless I'm mistaken only existed in Genbank temporarily as allegedly 
they have been fixed now by NCBI staff. So, I'm a bit worried that 
we're incurring issues while spending efforts on how to best solve a 
non-existent problem.

OTOH, it appears that the only two tests failing in 5.6.0 are the 
nested locations, so maybe no code changes are necessary in order to 
properly support all location strings in 5.6.0 except nested joins? If 
this is true the easiest solution would be to skip the two tests if 
perl is 5.6.0 or lower.

Any opinions, comments, or pieces of advice appreciated.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

To verify behaviour, use the following scriptlet on your platform:

my $re;
$re = qr{
           \(
             (?:
                 (?> [^()]+ )    # Non-parens without backtracking
               |
                 (??{ $re })     # Group with matching parens
             )*
           \)
         }x;
my $oparg = 'join(11..21,join(100..300,complement(150..230)))';
while( $oparg =~ s/(join|order|bond)$re//ig ) {
         print "match: \$oparg ='$oparg', \$\& = '$&'\n";
}

When run through perl -w it outputs

Use of uninitialized value in substitution (s///) at re.pl line 12.

under perl 5.6.0 (which is wrong) and

match: $oparg ='', $& = 
'join(11..21,join(100..300,complement(150..230)))'

under perl 5.6.1+ (which is correct).



More information about the Bioperl-l mailing list