Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Restrict a match within a string

by SavannahLion (Pilgrim)
on Jan 26, 2017 at 21:08 UTC ( [id://1180396] : perlquestion . print w/replies, xml ) Need Help??

SavannahLion has asked for the wisdom of the Perl Monks concerning the following question:

First here is a code sample

# this is easy... # grab a string my $sa ='random,junk no,one,cares about(I,Want,This,stuff) but, (not,t +his) stuff which&might have-random characters'; # extract the string between the first set of ( and ) my ($sb) = $sa =~ m/\((.+?)\)/; # split the string on , my @ab = split(',',$sb); # join and butress the string with ^ my $sc = '^' . join('^',@ab) . '^';

Obviously, I can do other things like use s/// instead of split and join to substitute , for ^. The key here is I wonder if it's possible to capture and split in one go during the match. Effectively not having the interim $sb and going straight to @sb

It seems like I should have the RegEx engine start matching between the first ( and ), but I can't get it to create groups inside the ()'s while ignoring the second ()'s.

Looking at the Perl Docs it seems like the answer is teasing me, but I just don't see it. Any suggestions?

Replies are listed 'Best First'.
Re: Restrict a match within a string
by NetWallah (Canon) on Jan 26, 2017 at 23:26 UTC
    print join ("^", ($sa =~ m/\((.+?)\)/)[0] =~/(\w+)/g) #I^Want^This^stuff

            ...it is unhealthy to remain near things that are in the process of blowing up.     man page for WARP, by Larry Wall

Re: Restrict a match within a string (updated comments)
by LanX (Saint) on Jan 27, 2017 at 01:40 UTC
    I prefer readable code over concise one, but since you asked for it

    # http://perlmonks.org/?node_id=1180396 use strict; use warnings; my $sa = 'random,junk no,one,cares about(I,Want,This,stuff) but, (not, +this) stuff which&might have-random characters'; my $new= $sa =~ s/ (?: ^ .*? \( )? # swallow start till first opening paren ( [^,)]+ ) # match words (non-delimiters) ,? # swallow delimiter (?: \) .* $ )? # swallow first closing paren till EOL / $1 #/xgr; print "#$new";
    # I # Want # This # stuff #

    NB: I wouldn't use this.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

Re: Restrict a match within a string
by Anonymous Monk on Jan 26, 2017 at 23:31 UTC

    Hi :)

    Can you describe in english words how the matching is supposed to work?

    And then simplify it even more until it barely sounds like real english, like a robot is talking in one word sentences?

    There is only one way to restrict a match, its by matching stuff, specific stuff specifically, more stuff than your target stuff

    So first start with the english, make a nice list describing whats supposed to match...

      Hi SavannahLion?

      I'm not sure if this anon Monk is you or somebody else?

      You already understand some key fundamentals of NetWallah's code, but you just don't know what you know. An attempt to clarify.

      In your code, my ($sb) = $sa =~ m/\((.+?)\)/; You knew that parenthesis were required for ($sb). That puts $sb into a list context. This assigns the first element of the list on the right to $sb. That is the "contents" of the first and only match. If you do not have the paren's, $sb will wind up being "1", the number of matches. I think you figured that part out on your own.

      You could have written, my $sb = ($sa =~ m/\((.+?)\)/)[0]; that takes a list slice of the stuff on the right and assigns the zeroth element of that list to the simple scalar $sb on the left. This is functionally the same as your code.

      So, my $sab = $sa =~ m/\((.+?)\)/)[0] is just exactly the same as your: my ($sb) = $sa =~ m/\((.+?)\)/;. There is not an explictly named $sb variable, but it is still there, just with some computer generated name that you don't know.

      The next part, =~/(\w+)/g is the same as if you had written my (@ab) = $sb =~/(\w+)/g. You used a split instead of the match global regex, my @ab = split(',',$sb); but these statements amount to about the same thing. In NetWallah's version there is not an explictly named array, @ab, but it is there under some computer generated name that you don't know. That array is fed into the join.

      So, now I come to my point...

      Your code and the more cryptic looking code are about the same and will have similar performance. I actually think your code as written is fine. It takes a few more lines, but it is easy to understand. Giving $sb and @ab explict names doesn't really cost anything. With Perl, fewer lines of code does not necessarily mean "faster" code. I've seen some devilishly tricky lines of code that actually run much slower that a more obvious approach would. My recommendation is to go for clarity as the first priority. Will you or somebody else understand the code a couple of years from now?