http://qs321.pair.com?node_id=121071


in reply to Re: Re:{2} Getting impossible things right (behaviour of keys)
in thread Getting impossible things right (behaviour of keys)

If this is going to be at all robust (umm and work as desired, sorry Blakem) I would change the sort to the following:
my $regex=join '|', map {substr $_,2} sort {$a cmp $b} map {pack "SA*",length($_),quotemeta($_)} keys %su +fdata;
Your code doesnt actually sort the words by length. (Yes I _am_ deliberately storing the length before I quotemeta it.)

:-)

Update
Thanks to Amoe I reexamined this and realized I missed an opportunity for lazyness that geeky virtue:

my $regex=join '|', map {substr $_,2} sort map {pack "SA*",length($_),quotemeta($_)} keys %su +fdata;
Although IIRC perl will optimize the first into the second anyway, it does save about 10 chars or so..
Oh also for the curious this is more modern form of the Schwartizian Transform which is a very cool trick. Unfortunately I cant remember the name of this version, nor the link to the excellent document I read about it. Hopefully someone that does will post a reply.

Update2
Tilly kindly supplied the link (see replies to this post). However the name I had in mind is the GRT or Guttman Rosler Transform.

DeMerphq / Yves
--
Have you registered your Name Space?

Replies are listed 'Best First'.
Re (tilly) 6: Getting impossible things right (behaviour of keys)
by tilly (Archbishop) on Oct 24, 2001 at 15:34 UTC
    I think the phrase you want is, "packed default".

    It is discussed in this paper on efficient sorting in Perl.

Re: Re: Re: Re:{2} Getting impossible things right (behaviour of keys)
by blakem (Monsignor) on Oct 24, 2001 at 21:35 UTC
    Ah, but you dont *need* to sort by length... the regex is anchored at the end, so the pattern that matches first from left to right will *already be* the longest match. For instance, look at the following code:
    #!/usr/bin/perl -wT use strict; my $text = 'fedcba'; $text =~ (/(a|ba|cba|dcba)$/); print "Match: $1\n"; =OUTPUT Match: dcba
    It matches on the longest one, even though its at the end of the alternation.... thats because the regex engine works from left to right, and the first one that matches wins. That was the whole point of my post, sorry I wasn't more explicit.

    -Blake

      Yes I see it now. Leftmost longest. I omitted the implication of the $. I should have caught the hint.

      Good one. :-)

      Yves / DeMerphq
      --
      Have you registered your Name Space?

        Well, it was my last post of the night, and I skimped on the explanation so I could get some sleep ;-P

        This thread illustrates why I try to post complete, self-contained scripts. (i.e. sample input via __DATA__, printed output samples, etc) I think much of the confusion could have been avoided, if the original script had a set of expected inputs and outputs.

        -Blake