[Bioperl-l] Re: nested joins
Jason Stajich
jason.stajich at duke.edu
Sun Apr 24 10:21:56 EDT 2005
This won't solve your problem, but I've fixed bug #1765 for nested
joins as well now, all with regexps so this closes bug:
http://bugzilla.open-bio.org/show_bug.cgi?id=1765
I think the RE solution is a little more elegant so I've stayed with
it. It does require re-ordering the sub-locations based on the input
string since the RE pulls out the groups first and then the non-joined
sections second.
Here is the code which captures the section (the $re is the same as one
hilmar is listing below):
# lets capture and remove all the sections which are
groups
while( $oparg =~ s/(join|order|bond)$re//ig ) {
push @sections, $&;
}
push @sections, split(/,/,$oparg) if length($oparg);
# because we don't necessarily process the string in-order
# as we are pulling the data from the string out for
# groups first, then pulling out data, comma delimited
# I am re-sorting the sections based on their position
# in the original string, using the index function to figure
# out their position in the string
# --jason
# resort based on input order, schwartzian style!
@sections = map { shift @$_ } sort { $a->[1] <=> $b->[1] }
map { [$_, index($oparg_orig, $_)] }
@sections;
-jason
On Apr 23, 2005, at 9:41 PM, Hilmar Lapp wrote:
> If I understand things somewhat correctly, then the following regexp
> is used to deal with nested joins (bug#1674):
>
> $re = qr{
> \(
> (?:
> (?> [^()]+ ) # Non-parens without backtracking
> |
> (??{ $re }) # Group with matching parens
> )*
> \)
> }x;
>
> This uses 2 advanced perlre features, which, despite being perfectly
> well documented in perl 5.6.0 behaves (matches) differently between
> perl 5.6.0 and later versions. (The irony seems to be that the
> expression itself appears verbatim in perlre as an example - already
> in 5.6.0!)
>
> I have tested 5.6.1 on linux and the expression matches correctly
> there. Maybe this is also a platform issue, but I don't have any other
> platform than Mac OSX 10.2 that still uses 5.6.0.
>
> I've included a scriptlet at the end with which people can test on
> their platform.
>
> This difference in behaviour is most likely the reason why the
> LocationFactory test fails on 5.6.0 but succeeds on later versions of
> perl.
>
> There's a couple of options we have:
>
> a) Require perl 5.6.1 in the Makefile.PL, and abandon support for
> 5.6.0.
> b) Remove support for nested joins in location strings.
> c) Branch in the respective piece of code depending on perl version
> and don't use the regex construct above if perl version is 5.6.0 or
> less, with the understanding that nested joins are not supported in
> perl 5.6.0.
>
> (BTW this is not supported at all in versions 5.005 and lower, so the
> requiring 5.005 in Makefile.PL should certainly be revised.)
>
> I'm a bit ambivalent on this as nested joins shouldn't really exist
> and unless I'm mistaken only existed in Genbank temporarily as
> allegedly they have been fixed now by NCBI staff. So, I'm a bit
> worried that we're incurring issues while spending efforts on how to
> best solve a non-existent problem.
>
> OTOH, it appears that the only two tests failing in 5.6.0 are the
> nested locations, so maybe no code changes are necessary in order to
> properly support all location strings in 5.6.0 except nested joins? If
> this is true the easiest solution would be to skip the two tests if
> perl is 5.6.0 or lower.
>
> Any opinions, comments, or pieces of advice appreciated.
>
> -hilmar
> --
> -------------------------------------------------------------
> Hilmar Lapp email: lapp at gnf.org
> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
> -------------------------------------------------------------
>
> To verify behaviour, use the following scriptlet on your platform:
>
> my $re;
> $re = qr{
> \(
> (?:
> (?> [^()]+ ) # Non-parens without backtracking
> |
> (??{ $re }) # Group with matching parens
> )*
> \)
> }x;
> my $oparg = 'join(11..21,join(100..300,complement(150..230)))';
> while( $oparg =~ s/(join|order|bond)$re//ig ) {
> print "match: \$oparg ='$oparg', \$\& = '$&'\n";
> }
>
> When run through perl -w it outputs
>
> Use of uninitialized value in substitution (s///) at re.pl line 12.
>
> under perl 5.6.0 (which is wrong) and
>
> match: $oparg ='', $& =
> 'join(11..21,join(100..300,complement(150..230)))'
>
> under perl 5.6.1+ (which is correct).
>
More information about the Bioperl-l
mailing list