Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

No \G for s///g ?

by tye (Sage)
on Mar 21, 2003 at 20:45 UTC ( [id://245000]=perlquestion: print w/replies, xml ) Need Help??

tye has asked for the wisdom of the Perl Monks concerning the following question:

In Re: Wrap while ignoring certain sequences (from CB) I ended up with this code:

my $len= 79; my $esc= '\e'; my $eseq= qr[$esc[^a-zA-Z]*[a-zA-Z]]; my $char= qr[(?:$eseq)*[^$esc\n]]; my $nonsp= qr[(?:$eseq)*[^$esc\s]]; s[(?:^|(?<=\s))((?:$char){1,$len}(?:$eseq)*)\s][$1\n]g; s[(?:^|(?<=\s))((?:$nonsp){$len}(?:$eseq)*)(?=[^$esc\s])][$1\n]g;
But it has two bugs (described later). What I think I should use is:
s[(?:\G|^)((?:$char){1,$len}(?:$eseq)*)\s][$1\n]gm; # ^^^^ ^ # vv s[(?:\G|(?<=\s))((?:$nonsp){$len}(?:$eseq)*)(?=[^$esc\s])][$1\n]g;
You see, the first substitution should only take place immediately after a newline. The ^ (and m option) take care of starting right after a newline that was already in the string. The \G should take care of starting right after a newline that was just inserted by a substitution (wait for it).

Note that you can't use ^ nor look-behind assertions to detect stuff that was inserted after the s///g started. This isn't documented (that I've seen), but I've tested it (and it makes sense).

\G should say "start where I left off last time" -- except double checking the documentation, it appears that this is only supported for m//g, not s///g. Is there a good reason for this that I'm missing? Update: Appears to be just a documentation issue (perhaps even an issue just with my reading of docs). See my reply below.

So the first substitution in the first block of code above could be better because it wastes time trying to split each short line starting at each whitespace in the line. The (?<=\s) works by matching the space that we just turned into a newline.

The second substitution in the first block of code is broken because a single word that spans more than two lines will only be wrapped once. I want to allow substitutions that start either right after I've wrapped a long word, or at the start of the next word. But without \G, I don't see any way to allow starting right after a substitution that doesn't also allow starting anywhere (which creates another bug as discussed on my previous node).

Anyone see a way I can work around this missing feature? Should I report this as a "bug"?

BTW, playing with this code is made easier by setting $len much lower and making $esc a regular character.

                - tye

Replies are listed 'Best First'.
Re: No \G for s///g ?
by blakem (Monsignor) on Mar 21, 2003 at 21:13 UTC
    \G ... is only supported for m//g, not s///g

    I didn't dig into your regex, but I think \G is supported for s///g;

    $ perl -le '$_="abc def"; print; s/\G\w/X/g; print' abc def XXX def
    -Blake

      Thanks! I thought I'd used that before but perlop having several examples of m/\G.../g and no examples of s/\G...//g and no mention of \G in the s/// section combined to make me think maybe that was the problem (as I was posting -- writing up a problem always brings new solutions to mind, ya know).

      I verified that this works on my version of Perl as well.

      I'd been meaning to post this for a week and finally had a few minutes while waiting for stuff to compile. This response got me to take some time I didn't have to write up some test cases. I must have had an unseen bug in the code when I was testing before because it works fine now.

      Thanks, blakem.

                      - tye

        Try setting $len to 5. You'll see that the 'tight' version fails to wrap correctly under certain circumstances like this

        Wrapping: @[0;7mCoruscate@[0m says this is a test of the line wrapping code Tight code1: @[0;7mCoruscate@[0m says this is a test of the line wrapping code Tight code: @[0;7mCorus cate@[0m says this is a test of the line wrapp ing code

        which suggested to me that the regexes be executed in opposite order

        Wrapping: @[0;7mCoruscate@[0m says this is a test of the line wrapping code Tight code1: @[0;7mCorus cate@[0m says this is a test of the line wrapp ing code Tight code: @[0;7mCorus cate@[0m says this is a test of the line wrapp ing code

        which seems to do the trick. I believe the first regex gets stymied by lines that have to be split many times at a space, as well as being split many times inside a word.


        ---
        demerphq


Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://245000]
Approved by gmax
Front-paged by gmax
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-04-19 10:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found