[Bioperl-l] Re: nested joins

Jason Stajich jason.stajich at duke.edu
Sun Apr 24 10:21:56 EDT 2005


This won't solve your problem, but I've fixed bug #1765 for nested 
joins as well now, all with regexps so this closes bug:
http://bugzilla.open-bio.org/show_bug.cgi?id=1765

I think the RE solution is a little more elegant so I've stayed with 
it.  It does require re-ordering the sub-locations based on the input 
string since the RE pulls out the groups first and then the non-joined 
sections second.

Here is the code which captures the section (the $re is the same as one 
hilmar is listing below):
               # lets capture and remove all the sections which are 
groups
	    while( $oparg =~ s/(join|order|bond)$re//ig ) {
		push @sections, $&;
	    }
	    push @sections, split(/,/,$oparg) if length($oparg);
               # because we don't necessarily process the string in-order
	    # as we are pulling the data from the string out for
	    # groups first, then pulling out data, comma delimited
	    # I am re-sorting the sections based on their position
	    # in the original string, using the index function to figure
	    # out their position in the string
	    # --jason
	    # resort based on input order, schwartzian style!
	    @sections = map { shift @$_ } sort { $a->[1] <=> $b->[1] }
  	                          map { [$_, index($oparg_orig, $_)] } 
@sections;


-jason

On Apr 23, 2005, at 9:41 PM, Hilmar Lapp wrote:

> If I understand things somewhat correctly, then the following regexp 
> is used to deal with nested joins (bug#1674):
>
> 	    $re = qr{
>              \(
>              (?:
>                 (?> [^()]+ )    # Non-parens without backtracking
>               |
>                 (??{ $re })     # Group with matching parens
>              )*
>              \)
>             }x;
>
> This uses 2 advanced perlre features, which, despite being perfectly 
> well documented in perl 5.6.0 behaves (matches) differently between 
> perl 5.6.0 and later versions. (The irony seems to be that the 
> expression itself appears verbatim in perlre as an example - already 
> in 5.6.0!)
>
> I have tested 5.6.1 on linux and the expression matches correctly 
> there. Maybe this is also a platform issue, but I don't have any other 
> platform than Mac OSX 10.2 that still uses 5.6.0.
>
> I've included a scriptlet at the end with which people can test on 
> their platform.
>
> This difference in behaviour is most likely the reason why the 
> LocationFactory test fails on 5.6.0 but succeeds on later versions of 
> perl.
>
> There's a couple of options we have:
>
> 	a) Require perl 5.6.1 in the Makefile.PL, and abandon support for 
> 5.6.0.
> 	b) Remove support for nested joins in location strings.
> 	c) Branch in the respective piece of code depending on perl version 
> and don't use the regex construct above if perl version is 5.6.0 or 
> less, with the understanding that nested joins are not supported in 
> perl 5.6.0.
>
> (BTW this is not supported at all in versions 5.005 and lower, so the 
> requiring 5.005 in Makefile.PL should certainly be revised.)
>
> I'm a bit ambivalent on this as nested joins shouldn't really exist 
> and unless I'm mistaken only existed in Genbank temporarily as 
> allegedly they have been fixed now by NCBI staff. So, I'm a bit 
> worried that we're incurring issues while spending efforts on how to 
> best solve a non-existent problem.
>
> OTOH, it appears that the only two tests failing in 5.6.0 are the 
> nested locations, so maybe no code changes are necessary in order to 
> properly support all location strings in 5.6.0 except nested joins? If 
> this is true the easiest solution would be to skip the two tests if 
> perl is 5.6.0 or lower.
>
> Any opinions, comments, or pieces of advice appreciated.
>
> 	-hilmar
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
> To verify behaviour, use the following scriptlet on your platform:
>
> my $re;
> $re = qr{
>           \(
>             (?:
>                 (?> [^()]+ )    # Non-parens without backtracking
>               |
>                 (??{ $re })     # Group with matching parens
>             )*
>           \)
>         }x;
> my $oparg = 'join(11..21,join(100..300,complement(150..230)))';
> while( $oparg =~ s/(join|order|bond)$re//ig ) {
>         print "match: \$oparg ='$oparg', \$\& = '$&'\n";
> }
>
> When run through perl -w it outputs
>
> Use of uninitialized value in substitution (s///) at re.pl line 12.
>
> under perl 5.6.0 (which is wrong) and
>
> match: $oparg ='', $& = 
> 'join(11..21,join(100..300,complement(150..230)))'
>
> under perl 5.6.1+ (which is correct).
>



More information about the Bioperl-l mailing list