note
ZZamboni
OK, it took me a few minutes to completely understand how and
why this works. Here is my dissected version of the regex:
<code>
s/(\d+) # first number (group #1)
(?: # group #2
, # followed by a comma
( # group #3
(??{$++1}) # match previous number + 1 (group 4)
) # end group #3
)+ # end group #4, repeat
/$1-$+/gx; # substitute for the first number followed by the
# last matched one
</code>
Group #1 matches the first number in a sequence of numbers.
Then, the <tt>??{$+ + 1}</tt> is used to match "the last number
plus one" (<tt>$+</tt> stands for whatever was matched by
the last set of grouping parenthesis). For the second number
in a sequence, the "last number"
is the one matched by group #1. But for subsequent numbers
(because of the <tt>+</tt>), the last number matched (this
is, whatever the <tt>??{$++1}</tt> matched last time) becomes
the "last number". So the thing repeats until the "last number
plus one" part doesn't match anymore (this is, until a non-consecutive
number is found), and then replaces the whole thing with the
first number (group #1), a dash, and the last number matched.<p>
At first look, I thought the double parenthesis around
<tt>??{$++1}</tt> were unnecessary, but without them it does
not work, and here is why: <tt>$+</tt> contains what was
matched by the <i>last</i> set of parenthesis, not the current
set. So by doubling the parenthesis, it makes $+ contain the
last thing matched by the current expression. Very clever!
<p>--<A HREF="/index.pl?node=ZZamboni&lastnode_id=1072">ZZamboni</A>
87538
87538