Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Using a regex to replace looping and splitting

by mwb613 (Beadle)
on Jan 24, 2018 at 00:08 UTC ( [id://1207789]=perlquestion: print w/replies, xml ) Need Help??

mwb613 has asked for the wisdom of the Perl Monks concerning the following question:

Thanks in advance for looking!

I have a string returned from a DB (Redis) that is pipe delimited and the individual items that are delimited are colon delimited

ex:

special:1001:area_code:617|special:1001:zip_code:02205|special:1001:dow:0|special:1001:tod:14

My goal is to convert these into a hash that looks like this (the first few sub-fields can be ignored):

{ area_code => 617, zip_code => 02205, dow => 0, tod => 14 }

I could write a loop no problem:

foreach my $little_string (split(/\|/,$big_string){ my @little_string_parts = split(/:/,$little_string); $result_hashref->{$little_string_parts[2]} = $little_string_parts[ +3]; }

that uses two splits though and I'm thinking a grep would be more efficient. Is there a way to iteratively progress through a string with a regex and use the matches to fill a hash? The below could work but it wouldn't account for the "big string" having a variable number of delimited members

$big_string =~ /special:[0-9]{4}:(.*):(.*)\|special:[0-9]{4}:(.*):(.*) +\|special:[0-9]{4}:(.*):(.*)\|special:[0-9]{4}:(.*):(.*)/; $result_hashref->{$1} = $2; $result_hashref->{$3} = $4; $result_hashref->{$5} = $6; $result_hashref->{$7} = $8;

Would a Map work here? I'm inexperienced with it but my thought is that it will only apply if we do a split on the initial string

Replies are listed 'Best First'.
Re: Using a regex to replace looping and splitting
by tybalt89 (Monsignor) on Jan 24, 2018 at 00:18 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1207789 use strict; use warnings; use Data::Dump 'pp'; $_ = 'special:1001:area_code:617|special:1001:zip_code:02205|special:1 +001:dow:0|special:1001:tod:14'; my %hash = /([^|:]+):([^|:]+)(?:\z|\|)/g; pp \%hash;

      Thanks so much for the response!

      A few questions:

      1. Do you alias Dumper as pp for Pretty Print, as in Python?

      2. You use "=" rather than "=~" in the %hash assignment, how does it work in comparison

      3. Would the regex assignment work just as well if you stuck the "big string" in between the hash assignment and the regex for example and skipped the $_ manipulation:

      my %hash = $big_string = /([^|:]+):([^|:]+)(?:\z|\|)/g;

      4. Can you elaborate on the Regex a little? I have questions but I'm not really sure where to start. The last clause includes an escaped "z" for example

        Hi,

        'Dumper' is exportwd by Data::Dumper. 'pp' is exported by Data::Dump.

        I think you are missing the distinction of the implied '$_' variable via the regex. it would look like my %hash = $_ =~ /([^|:]+):([^|:]+)(?:\z|\|)/g; You can just drop the '$_' in this case like he did.

        For your third question, the proper form using the '$big_string' var (instead of '$_') would be my %hash = $big_string =~ /([^|:]+):([^|:]+)(?:\z|\|)/g;

        For your last question, the regex captures the 2 colon separated values that are immediately followed by the end of string, ('\z'), or by a pipe. He writes that using a non-capturing group, (?: ... ).

Re: Using a regex to replace looping and splitting
by AnomalousMonk (Archbishop) on Jan 24, 2018 at 03:42 UTC

    Here's my take on this general problem:

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $rx_intro = qr{ special : \d{4} : }xms; ;; my $rx_key = qr{ \w+ }xms; my $rx_val = qr{ \d+ }xms; ;; my $rx_sep = qr{ : }xms; my $rx_delim = qr{ [|] }xms; ;; my $s = 'special:1001:area_code:617|special:1001:zip_code:02205|special:100 +1:dow:0|special:1001:tod:14'; ;; my $hashref = { $s =~ m{ \G $rx_intro ($rx_key) $rx_sep ($rx_val) (?: $rx_delim | \z) }xmsg }; dd $hashref; " { area_code => 617, dow => 0, tod => 14, zip_code => "02205" }
    By defining patterns of the pieces of the string separately, it's easier to play with and adjust for variations in the data: might there be whitespace around the  : or  | delimiters; might  'special' sometimes be  'general' or might the pattern of a key be more complex than  \w+ etc?

    Also, defining patterns separately allows one to validate the entire string before trying to extract anything from it. (Working with known-valid data is always nice.) Once a string is known to be in a particular, valid format, it is often quite simple to extract data fields from it.

    Update: See also regex is not working as I intended for a recent discussion of what seems a similar problem, although you will have to drill down a way before you get to a specification of the structure of the data that fireblood is working with.


    Give a man a fish:  <%-{-{-{-<

Re: Using a regex to replace looping and splitting
by johngg (Canon) on Jan 24, 2018 at 12:07 UTC

    You could do a single split on either a '|' or a ':' then process the resulting stream two items (key and value) at a time filtering out the 'special' pairs. You could use the core List::Util->pairs() if on 5.20 or later, or List::MoreUtils->natatime() from CPAN if not, to process in pairs. I use groupsOf() which I wrote when when working in a restricted environment with early Perl versions and no chance of installing non-core modules.

    use strict; use warnings; use Data::Dumper; sub groupsOf (&$@); my $dbStr = q{special:1001:area_code:617|special:1001:zip_code:02205|special:10 +01:dow:0|special:1001:tod:14}; my $rhRes = { groupsOf { $_[ 0 ] eq q{special} ? () : @_ } 2, split m{\||:}, $dbS +tr }; print Data::Dumper->Dumpxs( [ $rhRes ], [ qw{ rhRes } ] ); sub groupsOf (&$@) { my $rcToRun = shift; my $groupsOf = shift; my $rcDoIt; $rcDoIt = sub { $rcToRun->( map shift, 1 .. ( @_ < $groupsOf ? @_ : $groupsOf ) ), @_ ? &$rcDoIt : (); }; &$rcDoIt; }

    The output.

    $rhRes = { 'tod' => '14', 'zip_code' => '02205', 'dow' => '0', 'area_code' => '617' };

    I hope this is of interest.

    Update: Here's a version using List::Util->pairs(), the output is essentially the same.

    use strict; use warnings; use List::Util qw{ pairs }; use Data::Dumper; my $dbStr = q{special:1001:area_code:617|special:1001:zip_code:02205|special:10 +01:dow:0|special:1001:tod:14}; my $rhRes = { map { @{ $_ } } grep { $_->[ 0 ] ne q{special} } pairs split m{\||:}, $dbStr }; print Data::Dumper->Dumpxs( [ $rhRes ], [ qw{ rhRes } ] );

    Cheers,

    JohnGG

Re: Using a regex to replace looping and splitting
by tybalt89 (Monsignor) on Jan 24, 2018 at 13:14 UTC

    "Would a Map work here?" -> yes.

    #!/usr/bin/perl # http://perlmonks.org/?node_id=1207789 use strict; use warnings; use Data::Dumper; my $big_string = 'special:1001:area_code:617|special:1001:zip_code:022 +05|special:1001:dow:0|special:1001:tod:14'; my %hash = map +(split /:/)[-2, -1], split /\|/, $big_string; print Dumper \%hash;

    Use a split to split the big string on vertical bars, then split each part by colon, and keep the last two sub-parts (which are the key and value), to pass on through the map to the hash.

Re: Using a regex to replace looping and splitting
by Laurent_R (Canon) on Jan 24, 2018 at 07:26 UTC
    Would a Map work here? I'm inexperienced with it but my thought is that it will only apply if we do a split on the initial string
    Your thought is right: a map needs a list or an array as input, and a simple split on the input data would not help getting the data which would be usable by a map. So, a map would not help much here. You're basically left with a regex to capture everything in one go as shown by tybalt89, or something that is equivalent to two splits and two nested loops.
Re: Using a regex to replace looping and splitting
by pwagyi (Monk) on Jan 24, 2018 at 04:15 UTC

    I don't really see any problem with your way. I myself try to avoid using regex too much. (I know split uses regex, but still)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1207789]
Approved by beech
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-04-19 02:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found