http://qs321.pair.com?node_id=142419

lucky1 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to insert colons ':' into a string to represent a MAC address. I'm using join and split...

 print join(':', split /(..)/,'00a0c801adc6'),"\n<BR>";

But I get this as my output:

:00::a0::c8::01::ad::c6

Any idea why I get all the extra colons? My guess is that my regular expression is splitting (and joining) on a match at the start of every octet instead of just the end. How can I stop that from occurring? I've scoured "Programming Perl" and "Perl Cookbook" but can't find any help.
Thanks

Replies are listed 'Best First'.
Re: More Help with Regex
by chipmunk (Parson) on Jan 31, 2002 at 15:23 UTC
    In order to understand what is happening, you need to know how split works. split is used to split a string into fields based on a separator character or characters. For example, split(/,/, "Bob,17,M") returns the list ('Bob', '17', 'M') because the commas are treated as separators, and the text between them as fields. When the regex includes capturing parens, the captured separators are returned along with the fields, so split(/(,)/, "Bob,17,M") returns ('Bob', ',', '17', ',', 'M').

    split /(..)/, '00a0c801adc6') treats each pair of characters as a separator, and the text between each pair of characters as a field. Of course, there is no text between the characters, so you get null strings for the fields: ('', '00', '', 'a0', '', 'c8', '', '01', '', 'ad', '', 'c6'). (The trailing empty fields are discarded.) When you join that list back together, you get extra colons because of all the null strings.

    When you want to grab groups of characters from a string without separators, instead of split you can use a plain old regex match with /g: print join(':', '00a0c801ad' =~ /../g), "\n<BR>";

Re: More Help with Regex
by particle (Vicar) on Jan 31, 2002 at 15:06 UTC
    try
    #!/usr/local/bin/perl -w use strict; $_ = '00a0c801adc6'; s/([\w]{2})(?!$)/$1:/gio, "\n<BR>"; print "$_\n<BR>";

    to explain the regex,

    s/ ([\w]{2}) # matches a pair of letters/numbers (?!$) # matches NOT end of string (zero-width negative lo +okahead assertion) /$1:/giox; # replace pair with trailing colon
    all this assumes you don't have to untaint the mac address. otherwise, you'll want to do that before the s///.

    ~Particle

Re: More Help with Regex
by higle (Chaplain) on Jan 31, 2002 at 15:30 UTC
    To explain why your original try wasn't working, consider the following:
    split /(..)/,'00a0c801adc6';
    This returns an array that contains 12 elements, because the split function is, well, splitting the string, returning what the regex matches, along with what's in between the match (null characters).

    Update: Whoops! Chipmunk was quicker on the draw.

    ------------------------
    perl -e 's=$;$/=$\;W=i;$\=$/;s;;XYW\\U$"\;\);sig,$_^=$[x5,print;'
Re: More Help with Regex
by particle (Vicar) on Jan 31, 2002 at 15:40 UTC
    Update: i added petral's and impossiblerobot's routines to the comparison...

    although my solution was complex and used regex mojo, chipmunk's is by far the faster of the two. impossiblerobot's first is fastest of all, becase there's no real thinking involved. consider

    #!/usr/local/bin/perl -w use strict; use Benchmark; my $s = '00a0c801adc6'; timethese(100000,{ particle => sub { my $str = $s; $str =~ s/([\w]{2})(?!$)/$1:/gio; }, chipmunk => sub { my $str = $s; $str = join(':', $str =~ /../g); }, petral => sub { my $str = $s; $str = join(':', split /(?=(?:..)+$)/,$str); }, robot1 => sub { my $str = $s; $str = join(':', unpack 'a2a2a2a2a2a2', $str); }, robot2 => sub { my $str = $s; $str = join(':', unpack('a2' x (length($str)/2), $str)); }, });
    which results, on my 450MHz, 768MB ram, win98 box, in:
    C:\WINDOWS\Desktop>perl test_mac.pl Benchmark: timing 100000 iterations of chipmunk, particle, petral, rob +ot1, robot2... chipmunk: 3 wallclock secs ( 2.52 usr + 0.00 sys = 2.52 CPU) @ 396 +82.54/s (n=100000) particle: 6 wallclock secs ( 6.48 usr + 0.00 sys = 6.48 CPU) @ 154 +32.10/s (n=100000) petral: 5 wallclock secs ( 6.05 usr + 0.00 sys = 6.05 CPU) @ 165 +28.93/s (n=100000) robot1: 2 wallclock secs ( 1.27 usr + 0.00 sys = 1.27 CPU) @ 787 +40.16/s (n=100000) robot2: 3 wallclock secs ( 3.02 usr + 0.00 sys = 3.02 CPU) @ 331 +12.58/s (n=100000)
    this goes to show you just how much more time regexes can add to your algorithm.

    ~Particle

      Thanks for the times. But just to balance it out, yours could easily be considered the most straightforward:

      (cleaned up a little):     $str =~ s/..(?!$)/$&:/g;

      says "insert a colon between each pair of characters" at least as clearly as any of the others. Slight update: That should be s/..(?=..)/$&:/g, to make the point about readability.

        p
Re: More Help with Regex
by petral (Curate) on Jan 31, 2002 at 19:15 UTC
    Just to complete the record, this does what you want using split:
    print join(':', split /(?=(?:..)+$)/,'00a0c801adc6'),"\n<BR>";
    It avoids passing the separator by using non-capturing parens and a "zero-width" match. (And relies on there being an even number of characters.)

      p
Re: More Help with Regex
by impossiblerobot (Deacon) on Jan 31, 2002 at 20:32 UTC
    If you know the length of the string, you can use unpack:
    print join(':', unpack 'a2a2a2a2a2a2', '00a0c801adc6')),"\n<BR>";
    In fact, even if you don't, you could probably do something like this:
    my $string = '00a0c801adc6'; print join(':', unpack('a2' x (length($string)/2), $string)),"\n<BR>";
    (Documentation on using unpack seems unusually sparse, so there might be an easier way to do this that I am unaware of.)

    Also ++petral. I knew there had to be a way to use split and extended patterns, but couldn't work it out myself.

    Impossible Robot