http://qs321.pair.com?node_id=1018439

samwyse has asked for the wisdom of the Perl Monks concerning the following question:

I'm using XML::Simple, and I want to use the ForceArray option to the XMLin method. ForceArray accepts three types of values: 1 (to force all nested elements to be represented as arrays), a compiled regular expression (any matching element names will be forced into an array representation), or a list of element names and/or compiled regular expressions. In my situation, I want all element names starting with an uppercase letter to be forced into an array representation, except those whose name ends with either '_Flags' or '_Info'. The follwing works just fine if I want just one of the exceptions: qr/^[A-Z].*(?<!_Flags)$/. Unfortunately, the obvious extension (qr/^[A-Z].*(?<!_(?:Flags|Info))$/) fails with the error "Variable length lookbehind not implemented in regex". I've searched for ways around this, but none of the usual work-arounds fits this situation. I've tried several other regular expressions, but none of them work.

#!/bin/perl %desired = ( Port => 1, Port_Aliases => 1, Port_Flags => undef, Port_Info => undef, name => undef, ); foreach $r ( q/^[A-Z].*(?<!_Flags)$/, q/^[A-Z].*(?!_Flags$)/, q/^[A-Z].*(?(?=_Flags)\s$|$)/, q/^[A-Z].*((?<!_Flags)|(?<!_Info))$/, q/^[A-Z].*((?!_Flags$)|($!_Info$))$/, q/^[A-Z].*((?>_(Flags|Info))\s|)$/, q/^[A-Z].*(?<!_(?:Flags|Info))$/, ) { print "\ntesting /$r/\n"; foreach (sort keys %desired) { $expected = $desired{$_}; $result = /$r/x; $msg = ($result == $expected) ? 'ok' : 'ng'; print "'$_'\t$msg, got '$result', expected '$expected' +\n"; } }

Does anyone have any ideas? Setting ForceArray to a list of names isn't really an option, as there are several hundred possibilities. Thanks!

Replies are listed 'Best First'.
Re: Variable length lookbehind not implemented in regex
by AnomalousMonk (Archbishop) on Feb 12, 2013 at 20:31 UTC
Re: Variable length lookbehind not implemented in regex
by jethro (Monsignor) on Feb 12, 2013 at 23:30 UTC

    How about negating the question. Removes the need for negative matching:

    $result= (not /^([^A-Z]|[A-Z].*(_Flags|_Info))$/);
Re: Variable length lookbehind not implemented in regex
by Anonymous Monk on Feb 12, 2013 at 20:25 UTC

    Why does it need to be a look-behind? Why not capture up to the last underscore, do a lookahead for _Flags or _Info, and then continue capturing?

Re: Variable length lookbehind not implemented in regex
by ikegami (Patriarch) on Feb 14, 2013 at 08:00 UTC

    except those whose name ends with either '_Flags' or '_Info'.

    qr/^[A-Z].*\z(?<!_Flags)(?<!_Info)/s