Re: More Help with Regex
by chipmunk (Parson) on Jan 31, 2002 at 15:23 UTC
|
In order to understand what is happening, you need to know how split works. split is used to split a string into fields based on a separator character or characters. For example, split(/,/, "Bob,17,M") returns the list ('Bob', '17', 'M') because the commas are treated as separators, and the text between them as fields.
When the regex includes capturing parens, the captured separators are returned along with the fields, so split(/(,)/, "Bob,17,M") returns ('Bob', ',', '17', ',', 'M').
split /(..)/, '00a0c801adc6') treats each pair of characters as a separator, and the text between each pair of characters as a field. Of course, there is no text between the characters, so you get null strings for the fields: ('', '00', '', 'a0', '', 'c8', '', '01', '', 'ad', '', 'c6'). (The trailing empty fields are discarded.) When you join that list back together, you get extra colons because of all the null strings.
When you want to grab groups of characters from a string without separators, instead of split you can use a plain old regex match with /g:
print join(':', '00a0c801ad' =~ /../g), "\n<BR>";
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: More Help with Regex
by particle (Vicar) on Jan 31, 2002 at 15:06 UTC
|
#!/usr/local/bin/perl -w
use strict;
$_ = '00a0c801adc6';
s/([\w]{2})(?!$)/$1:/gio, "\n<BR>";
print "$_\n<BR>";
to explain the regex,
s/
([\w]{2}) # matches a pair of letters/numbers
(?!$) # matches NOT end of string (zero-width negative lo
+okahead assertion)
/$1:/giox; # replace pair with trailing colon
all this assumes you don't have to untaint the mac address. otherwise, you'll want to do that before the s///.
~Particle
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: More Help with Regex
by higle (Chaplain) on Jan 31, 2002 at 15:30 UTC
|
To explain why your original try wasn't working, consider the following:
split /(..)/,'00a0c801adc6';
This returns an array that contains 12 elements, because the split function is, well, splitting the string, returning what the regex matches, along with what's in between the match (null characters).
Update: Whoops! Chipmunk was quicker on the draw.
------------------------
perl -e 's=$;$/=$\;W=i;$\=$/;s;;XYW\\U$"\;\);sig,$_^=$[x5,print;'
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: More Help with Regex
by particle (Vicar) on Jan 31, 2002 at 15:40 UTC
|
Update: i added petral's and impossiblerobot's routines to the comparison...
although my solution was complex and used regex mojo, chipmunk's is by far the faster of the two. impossiblerobot's first is fastest of all, becase there's no real thinking involved. consider
#!/usr/local/bin/perl -w
use strict;
use Benchmark;
my $s = '00a0c801adc6';
timethese(100000,{
particle => sub {
my $str = $s;
$str =~ s/([\w]{2})(?!$)/$1:/gio;
},
chipmunk => sub {
my $str = $s;
$str = join(':', $str =~ /../g);
},
petral => sub {
my $str = $s;
$str = join(':', split /(?=(?:..)+$)/,$str);
},
robot1 => sub {
my $str = $s;
$str = join(':', unpack 'a2a2a2a2a2a2', $str);
},
robot2 => sub {
my $str = $s;
$str = join(':', unpack('a2' x (length($str)/2), $str));
},
});
which results, on my 450MHz, 768MB ram, win98 box, in:
C:\WINDOWS\Desktop>perl test_mac.pl
Benchmark: timing 100000 iterations of chipmunk, particle, petral, rob
+ot1, robot2...
chipmunk: 3 wallclock secs ( 2.52 usr + 0.00 sys = 2.52 CPU) @ 396
+82.54/s (n=100000)
particle: 6 wallclock secs ( 6.48 usr + 0.00 sys = 6.48 CPU) @ 154
+32.10/s (n=100000)
petral: 5 wallclock secs ( 6.05 usr + 0.00 sys = 6.05 CPU) @ 165
+28.93/s (n=100000)
robot1: 2 wallclock secs ( 1.27 usr + 0.00 sys = 1.27 CPU) @ 787
+40.16/s (n=100000)
robot2: 3 wallclock secs ( 3.02 usr + 0.00 sys = 3.02 CPU) @ 331
+12.58/s (n=100000)
this goes to show you just how much more time regexes can add to your algorithm.
~Particle
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Thanks for the times. But just to balance it out, yours could easily be considered the most straightforward:
(cleaned up a little):   $str =~ s/..(?!$)/$&:/g;
says "insert a colon between each pair of characters" at least as clearly as any of the others.
Slight update: That should be s/..(?=..)/$&:/g, to make the point about readability.
  p
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: More Help with Regex
by petral (Curate) on Jan 31, 2002 at 19:15 UTC
|
Just to complete the record, this does what you want using split: print join(':', split /(?=(?:..)+$)/,'00a0c801adc6'),"\n<BR>";
It avoids passing the separator by using non-capturing parens and a "zero-width" match. (And relies on there being an even number of characters.)
  p | [reply] [Watch: Dir/Any] [d/l] |
Re: More Help with Regex
by impossiblerobot (Deacon) on Jan 31, 2002 at 20:32 UTC
|
If you know the length of the string, you can use unpack:
print join(':', unpack 'a2a2a2a2a2a2', '00a0c801adc6')),"\n<BR>";
In fact, even if you don't, you could probably do something like this:
my $string = '00a0c801adc6';
print join(':', unpack('a2' x (length($string)/2), $string)),"\n<BR>";
(Documentation on using unpack seems unusually sparse, so there might be an easier way to do this that I am unaware of.)
Also ++petral. I knew there had to be a way to use split and extended patterns, but couldn't work it out myself.
Impossible Robot | [reply] [Watch: Dir/Any] [d/l] [select] |