http://qs321.pair.com?node_id=11133061


in reply to Regexp substitution on variable-length ranges with embedded code?

I agree "pure" regex isn't the way to go, but...

Win8 Strawberry 5.30.3.1 (64) Wed 05/26/2021 9:07:19 C:\@Work\Perl\monks >perl use 5.018; # need lexicals in regexes, regex extensions use strict; use warnings; my @Test = ( '43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 27:3:7; 27:3:8; 27: +3:9; 65:1:4; 65:1:18', '987:23:45; 987:23:46; 65:1:17; 65:1:19', ); for my $data (@Test) { print "'$data' \n"; my $rx_base = qr{ (?> \d+ : \d+ :) }xms; my $rx_tail = qr{ (?> \d+) }xms; my $rx_sep = qr{ (?> ;? \s*) }xms; my @run; $data =~ s{ ($rx_base) ($rx_tail) (?{ push @run, $^N }) (?: $rx_sep \1 ($rx_tail) (?{ push @run, $^N }) (?(?{ $run[-1] - $run[-2] != 1 }) (*F)) )+ } {$1$2-$3}xmsg; print "'$data' \n\n"; } ^Z '43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 27:3:7; 27:3:8; 27:3: +9; 65:1:4; 65:1:18' '43:1:1-6; 27:3:7-9; 65:1:4; 65:1:18' '987:23:45; 987:23:46; 65:1:17; 65:1:19' '987:23:45-46; 65:1:17; 65:1:19'
(I think this could be scaled back to pre-5.10 regexes if necessary.)

Update: Here's another version that I think is a bit nicer. It avoids "absolute" capture group variables and backreferences. It is also not push-y, using plain scalars that are self-initializing.

Win8 Strawberry 5.30.3.1 (64) Tue 06/01/2021 11:31:49 C:\@Work\Perl\monks >perl use 5.018; # need lexicals in regexes, regex extensions use strict; use warnings; my @Test = ( '43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 27:3:7; 27:3:8; 27: +3:9; 65:1:4; 65:1:18', '987:23:45; 987:23:46; 65:1:17; 65:1:19', ); for my $data (@Test) { print "'$data' \n"; my $rx_base = qr{ (?> \d+ : \d+ :) }xms; my $rx_tail = qr{ (?> \d+) }xms; my $rx_sep = qr{ (?> \s* ; \s*) }xms; my ($start, $prev, $end); $data =~ s{ ($rx_base) \K ($rx_tail) (?{ $start = $end = $^N }) (?: $rx_sep \g-2 ($rx_tail) (?{ ($prev, $end) = ($end, $^N) }) (?(?{ $end - $prev != 1 }) (*F)) )+ } {$start-$end}xmsg; print "'$data' \n\n"; } ^Z '43:1:1; 43:1:2; 43:1:3; 43:1:4; 43:1:5; 43:1:6; 27:3:7; 27:3:8; 27:3: +9; 65:1:4; 65:1:18' '43:1:1-6; 27:3:7-9; 65:1:4; 65:1:18' '987:23:45; 987:23:46; 65:1:17; 65:1:19' '987:23:45-46; 65:1:17; 65:1:19'


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: Regexp substitution on variable-length ranges with embedded code?
by Polyglot (Chaplain) on May 26, 2021 at 14:26 UTC
    Your solution works for me, with the addition of:
    use re 'eval';
    (I'm on Perl 5.12.)

    ...but I don't understand it. Specifically, this line is the most difficult one for me:

    (?(?{ $run[-1] - $run[-2] != 1 }) (*F))

    Is that double-eval'ed or executed? What is the '*F' referencing, and what is the '$^N' from the lines above? I've never seen this kind of regex before.

    Blessings,

    ~Polyglot~

      ... I don't understand ... this line ...

      (?(?{ $run[-1] - $run[-2] != 1 }) (*F))

      Is that double-eval'ed or executed? What is the '*F' referencing ...

      The embedded code is executed.

      +----------------------+------ embedded Perl code | | v v (?(?{ $run[-1] - $run[-2] != 1 }) (*F))
      This is the "(?(*condition*)*yes-pattern*)" regex expression added with Perl version 5.10 (see Extended Patterns in perlre). In this case, the *condition* is the true/false result of evaluating the code. If true, the (*F) (a.k.a. (*FAIL)) backtracking control verb is executed and the match fails and backtracks to the most recent successfully matched substring: a sequence with incrementing values for $3.

      ... what is the '$^N' ...

      The $^N Perl special variable (see Variables related to regular expressions in perlvar) returns the value of the most recently closed capture group.


      Give a man a fish:  <%-{-{-{-<