[Bioperl-l] nested joins
Hilmar Lapp
hlapp at gnf.org
Sat Apr 23 21:41:31 EDT 2005
If I understand things somewhat correctly, then the following regexp is
used to deal with nested joins (bug#1674):
$re = qr{
\(
(?:
(?> [^()]+ ) # Non-parens without backtracking
|
(??{ $re }) # Group with matching parens
)*
\)
}x;
This uses 2 advanced perlre features, which, despite being perfectly
well documented in perl 5.6.0 behaves (matches) differently between
perl 5.6.0 and later versions. (The irony seems to be that the
expression itself appears verbatim in perlre as an example - already in
5.6.0!)
I have tested 5.6.1 on linux and the expression matches correctly
there. Maybe this is also a platform issue, but I don't have any other
platform than Mac OSX 10.2 that still uses 5.6.0.
I've included a scriptlet at the end with which people can test on
their platform.
This difference in behaviour is most likely the reason why the
LocationFactory test fails on 5.6.0 but succeeds on later versions of
perl.
There's a couple of options we have:
a) Require perl 5.6.1 in the Makefile.PL, and abandon support for
5.6.0.
b) Remove support for nested joins in location strings.
c) Branch in the respective piece of code depending on perl version
and don't use the regex construct above if perl version is 5.6.0 or
less, with the understanding that nested joins are not supported in
perl 5.6.0.
(BTW this is not supported at all in versions 5.005 and lower, so the
requiring 5.005 in Makefile.PL should certainly be revised.)
I'm a bit ambivalent on this as nested joins shouldn't really exist and
unless I'm mistaken only existed in Genbank temporarily as allegedly
they have been fixed now by NCBI staff. So, I'm a bit worried that
we're incurring issues while spending efforts on how to best solve a
non-existent problem.
OTOH, it appears that the only two tests failing in 5.6.0 are the
nested locations, so maybe no code changes are necessary in order to
properly support all location strings in 5.6.0 except nested joins? If
this is true the easiest solution would be to skip the two tests if
perl is 5.6.0 or lower.
Any opinions, comments, or pieces of advice appreciated.
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
To verify behaviour, use the following scriptlet on your platform:
my $re;
$re = qr{
\(
(?:
(?> [^()]+ ) # Non-parens without backtracking
|
(??{ $re }) # Group with matching parens
)*
\)
}x;
my $oparg = 'join(11..21,join(100..300,complement(150..230)))';
while( $oparg =~ s/(join|order|bond)$re//ig ) {
print "match: \$oparg ='$oparg', \$\& = '$&'\n";
}
When run through perl -w it outputs
Use of uninitialized value in substitution (s///) at re.pl line 12.
under perl 5.6.0 (which is wrong) and
match: $oparg ='', $& =
'join(11..21,join(100..300,complement(150..230)))'
under perl 5.6.1+ (which is correct).
More information about the Bioperl-l
mailing list