http://qs321.pair.com?node_id=179624


in reply to Parsing with Perl 6

The only thing I regret is that matching non greediness has not been made the default. Larry is discussing this in perl6-language. His concern is that greediness is disconcerting for beginners but he laters goes on saying that non greediness is too. Strangely his favorite argument "Huffman Coding" is not invoked here. Personally, I add the "?" modifier by default and supress it when it would break my regexep... like in the exemple given by Larry
my ($num) = /(\d*)/; # greediness needed

So, because in my use, it is prevalent, I think that non greediness should be made the default but I have not checked other people code to see if they use non greediness more than greediness. Also I don't know if changing the default would affect the ratio of greediness/non greediness usage claimed to be 10/1 for perl 5.8 in an answer to this nodelet. Anyway, this ratio seems to show that criteria of Huffman coding would lead to a different grammar decision choice for me than to most of people. Also, I don't think that new feature introduced in perl6 regexp would change this pattern.

-- stefp -- check out TeXmacs wiki

Replies are listed 'Best First'.
Re: Perl6: too bad non-greediness is not made the default
by Aristotle (Chancellor) on Jul 05, 2002 at 14:37 UTC

    I try to write my regexen backtracking-free, as, I believe, anyone should. And in that case greediness is very desired and useful most of the time. Non-greediness basically means that you match broader than you really need to - it works because you "forwardtrack", you gobble the string one submatch at a time. It is better to match more narrowly and greedily, since a greedy match will gobble up a lot of the string in one fell swoop and do less superfluous searching. In simple cases the regex optimizer is smart enough to simplify a non-greedy match into a Boyer-Moore search, but when you're working with a complex regex you really want to match narrowly and greedily.

    Regexen are a tricky art.

    Makeshifts last the longest.

      I beg to differ. My general principle is to be lax on what I receive and strict on what I emit. So I try to write my regexen as lax as possible and to "synchronize" my match on characters/sequences I am sure will be present in the input. This means that I often use .*? in my regexen. Using the greedy counterpart will lead to a lot of backtracking and more so geometrically with the number of .*?. Also I incrementally build my regexen testing them on samples: non greedy match also is less a nuisance here when examining the matches.

      Regexen are a tricky art and I like to abuse it at (*) the risk of being called demented. (*) And I beg to disagree with Felix Gallo, France is the lang of semiology, not of semiotics... and hair splitting too, BTW. And, on a related field, the main tagmemics foray in France is collateral to the introduction of american camelides but has yet to appear widely in French. tagmemique is indeed French neologism that googlewhacks until the present node referencing of a node

      -- stefp (qui aime couper les cheveux en quatre)-- check out TeXmacs wiki

        Each to his own, I guess. :)

        Makeshifts last the longest.

Re: Perl6: too bad non-greediness is not made the default
by Juerd (Abbot) on Jul 06, 2002 at 14:22 UTC

    The only thing I regret is that matching non greediness has not been made the default.

    Non-greediness is only useful if you have something following the non-greediness. For example, /(\d*)/ would not be useful at all: it would always succeed, matching and capturing an empty string. /(\d+)/ would be equivalent to /(\d)/, because a non-greedy expression doesn't take more than the absolute minimum.

    If you use non-greedy quantifiers now, think about the efficiency that is gained by re-writing for greediness. Suppose you have /"(.*?)"/ - it can be written as /"([^"]*)"/, which is much more efficient. With backtracking disabled, as jryan did in his example (with the new : meta character), it's even more efficient (in case the subject string is not well formed).

    - Yes, I reinvent wheels.
    - Spam: Visit eurotraQ.
    

      I guess TIMTOWTDI. It is a question of emphasis, yours is on performance, mine is on readability and incremental building.

      If I combine my idea of (ab)?using .* with the concept that one can hack perl6 to make one's own language, I would define a modifier that would turn consecutive non breaking line spaces into an implicit .*? subregex with ? having its traditional meaning of non greediness. I am not sure it would be a so good idea though. Will be a good exercice anyway.

      -- stefp -- check out TeXmacs wiki

Re: Perl6: too bad non-greediness is not made the default
by Anonymous Monk on Jul 05, 2002 at 20:00 UTC
    I do not think your high use of non-greediness is anywhere near the norm. A rough and ready regex tromp through the .pl and .pm files in the 5.8.0 distribution shows greedy usage to exceed non-greedy usage by around 10 to 1. So the huffman argument would appear to be against you on this one.
      But does greedy usage exceed non-greedy usage because people wanted to be greedy, or because they didn't care and omitting the ? is easier?