Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Whats your favorite nonstandard regex quote char?

by demerphq (Chancellor)
on Apr 22, 2003 at 15:25 UTC ( [id://252285]=perlmeditation: print w/replies, xml ) Need Help??

This node is inspired by the following (excerpt from the) comment by BrowserUk in Re: Re: Re: looking for a regex

s_\{.*?\}_{$sub[$n++]}_g;

I'm not at all sure about using _ as the delimiter, but everything else I tried was altogether too messy.

Assuming that the traditional '/' is not available, I would have written that as one of the following

# For the sake of the discussion lets ignore the fact the # curlies didnt need to be escaped. s!{.*?}!{$sub[$n++]}!g; s:{.*?}:{$sub[$n++]}:g;

And I certainly wouldn't have done what I know is popular amongst some monks

s#{.*?}#{$sub[$n++]}#g;

Because I find it makes the code hard to understand at a glance, especially in an editor that doesnt know that the # in this context isnt a comment begin marker.

So what are the monks at large opinion on this ever so trivial a subject? Which alternate regex delimiter do you favour? And what are the arguments behind your opinion (if any)?


---
demerphq

<Elian> And I do take a kind of perverse pleasure in having an OO assembly language...

Replies are listed 'Best First'.
Re: Whats your favorite nonstandard regex quote char?
by broquaint (Abbot) on Apr 22, 2003 at 15:36 UTC
    So what are the monks at large opinion on this ever so trivial a subject? Which alternate regex delimiter do you favour? And what are the arguments behind your opinion (if any)?
    My quote character of choice when subsituting is the curly brace pair e.g
    s{ ( \w+ ) [ ] ( \w+ ) }($2 $1)xg;
    And for anything but the simplest of regexes (that can't be fielded off to the likes of index()) the /x modifier is a must. I also use parentheses as the replace part of a substitute.

    My reasoning is that curly braces look like a block and parentheses look like execution (or at least that's how it mnemonically maps in my head).
    HTH

    _________
    broquaint

Re: Whats your favorite nonstandard regex quote char?
by The Mad Hatter (Priest) on Apr 22, 2003 at 15:34 UTC
    I tend to favor either ! or |. Occasionally I'll use ' as well. As for a reason, I dunno...it looks good.
    s'{.*?}'{$sub[$n++]}'g; s|{.*?}|{$sub[$n++]}|g;

      A word of caution re: using ' as a regex delimiter, if the regex includes vars that you want interpreted, the regex won't do what you want when you use 's--as in this case for instance.

      $s ='25 {fred and barney} text 2.36 12.0 {bam bam} text {pebbles}'; $n=0; $s =~ s'{.*?}'{$sub[$n++]}'g; print $s; 25 {$sub[$n++]} text 2.36 12.0 {$sub[$n++]} text {$sub[$n++]}

      Examine what is said, not who speaks.
      1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
      2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
      3) Any sufficiently advanced technology is indistinguishable from magic.
      Arthur C. Clarke.
        Ooh! Thank you for pointing out that caveat. It will no doubt someday save me hours of debugging time. ;-)
      ', sure, but |? (Ok, that sentence needs work:))

      I'd shy away from metacharacters; at first glance, it's hard to tell you're not starting an alternative rather than starting the right side.
      --
      Mike

        It just looks clean to me, especially since it spans the height of the whole line. In my editor, it stands out quite nicely, so that isn't a problem for me. If it does start to get confusing though, I will use a different character.
Re: Whats your favorite nonstandard regex quote char?
by Mr. Muskrat (Canon) on Apr 22, 2003 at 15:39 UTC

    It all depends on what I'm doing... If it's a script that I'll use more than once, I tend to lean towards readability. If it's an obfuscation, I go for confusion of course.

    For what should be obvious reasons, these are some of my personal favorites:
    s@{.*?}@{$sub[$n++]}@g;
    s${.*?}${$sub[$n++]}$g;
    s%{.*?}%{$sub[$n++]}%g;
    s={.*?}={$sub[$n++]}=g;

      For shame! Where is

      s;{.*?};{$sub[$n++]};g;

      :-)


      ---
      demerphq

      <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
        's' is a fun delimiter, especially with a /s modifier and some variables named 's' for good measure. :)
        ss{.*?}s{$s[$s++]}ss;

        -Matt

        I wanted to leave some for the imagination :)

      Yes I can see those being favorites for obfuscation - in general work its likely better to avoid the use of overly meaningful characters. I'd include sigils at the top of that list, then other things that are likely to confuse (like the pound character). I waver between the various bracketed characters - curly braces, square braces, chevrons and in last place parentheses. Which one I actually use is predicated on what the content of the expression is and whether it conflicts with my delimiters.

        Which one I actually use is predicated on what the content of the expression is and whether it conflicts with my delimiters.

        Naturally. But assuming that the only character that is out of bounds is the / which would you use and why?


        ---
        demerphq

        <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
      s\{.*?}\{$sub[$n++]}\g;
      *evil grin*

      Makeshifts last the longest.

Re: Whats your favorite nonstandard regex quote char?
by LAI (Hermit) on Apr 22, 2003 at 16:20 UTC

    Like The Mad Hatter, I favour the pipe.

    s|{.*?}|{$sub[$n++]}|g;

    It has good visibility, is simple, yaddax3. I toyed around with using #, and it works well as a visual cue, because it registers as 'dark' among visually 'lighter' characters. But accidental obfuscation, as well as inadequate editor syntax hilighting, made me drop that.

    I'm not especially fond of ,.`"', because I like for the whole height of the line to be covered. Makes the character look more like a delimiter. Paired delimiters such as []{}()<> do that, but I can't get used to them. They make a lot of sense, and can make certain regexps nice and tidy... but even on multiline things I like my pipe:

    s| {.*?} | {$sub[$n++]} |gx;

    LAI

    __END__
      Paired delimiters such as []{}()<> do that, but I can't get used to them.

      If it is the pairing you can't get used to, you could always just the right side of the pair. :) e.g.

      $s =~ s>{.*?} >{$sub[$n++]}>gx; $s =~ s){.*?} ){$sub[$n++]})gx;

      Personally I tend to use ! because of the rarity of having to match a ! in a string, or having a negative look-ahead/behind assertions. I do like the way the pipe looks but tend to avoid it because it is the character for alternation.

      -enlil

        Yeah, the half-pair is fun. Personally I would have split up that re as:

        $s =~ s>{.*?} >{$sub[$n++]} >gx;

        ...but TIMTOWTDI. And as for the pipe being the character for alternation... you're right of course. But not many characters are used infrequently enough that they make good delimiters. So I just use whichever I feel looks good until there's a conflict with the content of the regex.

        LAI

        __END__

      I like using paired delimiters in this context, usually curlies.

      Speaking of quote delimiters, there's a lovely piece of code in Perl_yylex() (in toke.c) to find the matching balanced delimiter of a quoted string:

      if (term && (tmps = strchr("([{< )]}> )]}>",term))) term = tmps[5];

      Pop quiz: why does this work? Why are the closing delimiters repeated twice?

      _____________________________________________
      Come to YAPC::Europe 2003 in Paris, 23-25 July 2003.

Re: Whats your favorite nonstandard regex quote char?
by Abigail-II (Bishop) on Apr 22, 2003 at 20:36 UTC
    I prefer my delimiters to be tall and skinny, just like /. | is tall and skinny, but that leads often to a clash with the special regex character. So, I usually use !, which only gives a problem if you want to use (?!) or (?<!), but they are uncommon enough for ! to be useful. And I often use the balanced delimiters, the four sets of braces ({ } being my favourites). ! I seldom use for matching, only for substitutions, or any of the q* operators, but I prefer / or { } for them.

    Only in one liners, vi or IRC, I sometimes use a period when doing substitution, but never in code that's stored in a file.

    I don't like to use #, @ or similar characters as delimiters. I know they are popular, but they are too black to my taste, and draw the eye away from the important regex towards the unimportant delimiters.

    Abigail

Re: Whats your favorite nonstandard regex quote char?
by jmcnamara (Monsignor) on Apr 22, 2003 at 16:09 UTC

    The following is also possible (if not entirely legible):
    s g{.*?}g{$sub[$n++]}gg;

    --
    John.

Re: Whats your favorite nonstandard regex quote char?
by BrowserUk (Patriarch) on Apr 22, 2003 at 21:30 UTC

    I almost universally use s[...][...]g; for regexes the exception being when I'm using the /e modifier in which case I tend to use s[...]{....}ge; as the curlies give a visual indication that the right-hand side is active rather than passive. In this specific case, the same regex in my test code is coded as

    s[[{].*?[}]][{$sub[$n++]}]g;

    which (I think) looks fine under the syntax highlighting in my editor, but looked altogether too confusing when rendered in the b&w of the preview page. So I looked for some way of rendering it more clearly in this environment. I tried various options and settled on

    s_\{.*?\}_{$sub[$n++]}_g;

    as the least bad.

    As you pointed out, perl is very clever about deciding whether a metachar is or is not being used as a metachar, but I still tend to escape them as my brain/eyes are less adept at the art.

    I've recently started using [metachar] (unashamedly stolen from some of Abigail-II's posts) instead of \metachar for escaping metachars in regexes as it allows me to use a consistant method for single and multiple alternatives, and I'm finding that consistancy is the key to easy visual parsing of code.

    I use m[...] almost universally in preference to /.../, and map{...}; grep{...}; even when the blocking isn't strictly required for similar reasons. I find the visual consistancy and ease of extensibility far outway the minor performance penalty.

    I've never yet posted a set of personal style rules as I'm still formulating mine. Very little of my style (or lack thereof) is yet fixed in stone, I tend to see things that other people are doing that seem particularly clear/neat/concise/cool and try them for a while and see what sticks. Those that don't bug me to type, hinder my reading or cause any problems in other ways tend to stay.

    There are some which I use in my own code that I relegiously remove from the code when posting. Eg.

    .... #! This is a comment

    Try as hard as I might to tailor my syntax highlighter definition, it still confuses everything on a line after

    $#array

    with a comment and highlight it accordingly. So I changed the comment card spec to being #!. This means that all the comments and the shebang lines are displayed in a muted green, which suits me. However, I started removing (mostly) the ! from the comments when posting as a private msg from someone said that everytime he looked at the code I posted with them in, he saw multiple shebag lines and freaked.

    I guess I should get around to setting up PerlTidy to filter code to my preferences on input and back to something "more normal" on output, but that probably wouldn't help much as I tend to c&p directly from the editor. I also have a set of macros that do most of the transformations--tab width, curly positioning etc--already in my editor. To use perltidy I would have to save the code out to a file, load it in another editor to view it in the "more normal" form a c&p from there, which would just be a pain.


    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.

      What editor do you use? I had the same problem with Vim, where it would comment the rest of the line after $#array. I fixed it by changing the perlComment definition to this:

      "$\@<!#.*"
      which only matches a # not preceded by a $.

      kelan


      Perl6 Grammar Student

Re: Whats your favorite nonstandard regex quote char?
by demerphq (Chancellor) on Apr 22, 2003 at 16:41 UTC

    <Shameless_Plug>

    Incidentally for anyone who cares, you can let Text::Quote figure out the best quote char for you if you like. :-)

    </Shameless_Plug>


    ---
    demerphq

    <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
Re: Whats your favorite nonstandard regex quote char?
by Juerd (Abbot) on Apr 22, 2003 at 18:58 UTC

    So what are the monks at large opinion on this ever so trivial a subject? Which alternate regex delimiter do you favour? And what are the arguments behind your opinion (if any)?

    I prefer []. Except for the RHS with s///e, because I prefer {} there.

    [] is VERY easy to read, Data::Dumper B::Deparse uses it a lot and it will make even more sense when we have Perl 6's regexes, where [] is used for grouping.

    {} often delimits code. It makes sense with s///e because the RHS is code, and creates scope, etc etc.

    s[\{.*?\}][{$sub[$n++]}]gx;
    I escaped the {} to be on the safe side when something is added in front of it in a future version.

    Also note that syntax highlighting helps more than choosing the right delimiter.

    Juerd
    - http://juerd.nl/
    - spamcollector_perlmonks@juerd.nl (do not use).
    

      I dont understand your reference to Data::Dumper in this context. Whats that all about?


      ---
      demerphq

      <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...

        I dont understand your reference to Data::Dumper in this context. Whats that all about?

        Neither do I, but that is because I meant B::Deparse, not Data::Dumper. Sorry :)

        B::Deparse uses q['] instead of '\'', for example. I like that.

        Juerd
        - http://juerd.nl/
        - spamcollector_perlmonks@juerd.nl (do not use).
        

Re: Whats your favorite nonstandard regex quote char?
by belg4mit (Prior) on Apr 22, 2003 at 16:31 UTC
Re: Whats your favorite nonstandard regex quote char?
by PodMaster (Abbot) on Apr 23, 2003 at 07:28 UTC
    I don't think about it much anymore, and mostly use // or {}.
    Once upon a time I did fancy the following ( ¡ = 0161, ¿ = 168, º = 167 -- using ALT+DOWN -> NUMPAD -> ALT+UP)
    s¡¡¡g s¿¿¿g sºººg
    I really fancy  qw[ ] though.


    MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
    I run a Win32 PPM repository for perl 5.6x+5.8x. I take requests.
    ** The Third rule of perl club is a statement of fact: pod is sexy.

Re: Whats your favorite nonstandard regex quote char?
by John M. Dlugosz (Monsignor) on Apr 22, 2003 at 19:12 UTC
    How about ¸ (U+00B8) which looks like a fancy comma?

Re: Whats your favorite nonstandard regex quote char?
by parv (Parson) on Apr 23, 2003 at 05:19 UTC

    I used to use s###, but lately i find that that makes the following characters harder to read. So i use s!!! in short/simple ones; s{}// or s{}() otherwise.

    I absolutely cannot take any of [;,.] as all three seem to enjoy intermingling w/ rest of the text.

Re: Whats your favorite nonstandard regex quote char?
by Aristotle (Chancellor) on Apr 26, 2003 at 02:54 UTC
    Nearly universally the bang for me, when the standard slash won't do. If that still doesn't help, it means the pattern is convoluted and calls for /x, which works best with paired delimiters (curlies for me). I tend to do something like
    s{ foo }{ bar }x
    which nicely exposes the delimiters to skimming eyes. Add more whitespace if necessary. I'm not keen on splitting the pattern across lines, and prefer to avoid it if it isn't really necessary.

    Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://252285]
Approved by davis
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-26 05:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found