Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

To use a module...or not.

by BrowserUk (Patriarch)
on Jul 24, 2004 at 02:16 UTC ( [id://377058]=perlmeditation: print w/replies, xml ) Need Help??

At Re^3: sorting a complex multidimensional hash, I took some gently stated critique seriously and tried to find a better way of expressing the ST and GRT versions of that sort. Then later, Ven'Tatsu's post made me look for a CPAN solution.

If you like the idea of encapsulating your ST sorts (or the GRTs & Orcish Manouvers), then there is an existing module that will do this for you Sort::Maker.

It's an impeccably pedigreed module; cleanly coded; well documented etc. The problem is, it took me much longer to work out how to write even a simple ST or GRT using it than writing them directly. The resultant code requires just as much typing and (IMO) is no clearer or easy to maintain that the usual map-sort-map version.

I then spent many hours trying to work out how to apply it to Re: sorting a complex multidimensional hash and got nowhere. I simply couldn't work out how to use it to this sort, either the ST or the GRT version. Both of my hand coded versions took under an hour each to write and test.

There is nothing wrong with the module, not it's interface design, nor it's documentation. The problem is that even for relatively simple versions of either style of sort, you have to pass all the information required to perform the pre- and post- processing of the data, plus the comparator, plus the direction of the required sort into the module as small, discrete chunks of code.

I find it's just much easier to see what is going on, and what is going wrong, and to test and spot where they are going wrong, when I write that code in the usual manner, than I do as lots of discrete little bits.

I have similar misgiving with callback interfaces typified by things like the various HTML parsing modules. You're constantly operating one level removed from the place of execution. If what you code works first time, they're great. It's when I get a problem, make a mistake, that I find myself having to bypass the documentation and going to the source to try and work out what and where the problem lies. At that point, the main benefits of using a canned solution tend to go out of the window.

Even if I succeed in making the combination work, if the code needs modification at some later stage, I (or the whomever has to maintain the code), has the same problem of trying to understand the combined effect of two separate pices of code, rather than one.

I am the only one that feels this way?

How would you simplify/clarify the coding of ST?


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

Replies are listed 'Best First'.
Re: To use a module...or not.
by gaal (Parson) on Jul 24, 2004 at 07:28 UTC
    Someone was recently asking about idioms. Perhaps the value of Sort::Maker to you was as a reference (perhaps even, since you were familiar with the techniques involved, a stylistic reference). Once something becomes so simple for you that it's easier to do it on your own than ask someone to do it for you, well, it's easier to do it on your own.

    Occasionally I memoize a function by myself. It's just too damn easy to code, and in some environments much easier to deploy than requiring an extra module. This bloats my code somewhat, but mostly it goes against the Doctrines of Code Reuse and of The Separation of Concerns. To hell with that! If my project needs to memoize one function, I may very well do it myself. If it starts happenning all over the place, then I'll pull in Memoize or Aspect::Library::Memoize. I'm also thinking about the future maintainer: someone not familiar with the technique (or with its name) may actually need to spend less overhead thinking about what this means:

    { my %cache; sub calculate_stuff { my ($data) = @_; return $cache{$data} if exists $cache{$data}; # cache hit # calc return $cache{$data} = $result; }
    ...than meeting a completely unfamiliar name, finding a reference, and explicitly learning something new. Maybe it's better for them to know the memoization technique, but my primary concern in writing maintainable code is not to educate my successors.

    So sometimes the help of a module is in its paradigmatic solution of a problem. Someone once said that the real trick is to copy good design, not just good code.

Re: To use a module...or not.
by elusion (Curate) on Jul 24, 2004 at 02:47 UTC
    I'd use Perl 6.
    my @results = sort { complex_operation($^a) } @list;
    If you pass Perl 6's sort a block that takes a single argument, it will use it as a key for sorting the data, as a ST would.

      The problem is how to flatten the nested hash keys and then reconstitute them? (+ availability:)


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
        Step backwards in time and rediscover fixed length records.
Re: To use a module...or not.
by graff (Chancellor) on Jul 26, 2004 at 05:13 UTC
    I am the only one that feels this way?

    Surely not. The modules that people really need (and the others that people just like to use) are the ones that really do make things easier -- especially things that require lots of broader knowledge about the world outside of Perl, like details about calendars, internet protocols, RDBMS API's, and that sort of stuff.

    Meanwhile, the ones that people tend to pass by, or forget, or deliberately avoid, are the ones that are just trying to "modularize" some idiom, objectify some data structure, or otherwise provide some syntactic sugar on stuff that is intrinsic to Perl and just seemed to the module author to be an itch worth scratching. But sugar for one programmer can be sand to another.

    How would you simplify/clarify the coding of ST?

    I don't think it's a matter of simplifying the coding (in the sense of making it simple to understand) -- unless you want to go to the trouble of breaking it apart into separate loops and using temporary storage (which would nullify some of the benefits of the approach).

    It's a matter of coming up with a simple/clear way of introducing the concept to the uninitiated. It's kinda like introducing calculus to high school kids after they've had enough algebra. (I think back to when the teacher spent the better part of an hour covering the blackboard with all the algebra needed to estimate the area under a segment of a power function, then explained how to reduce it all to a "one-liner".)

    People first need to see the problem solved in terms they are familiar with (apply a for loop in order to make a temp array that is easy to sort, apply basic sort to that, and another for loop to recreate the original data in sorted order. That makes sense. Now, introduce "map" to replace "for"; then just show how to remove the temp array, by having the first "map" feed directly to the sort, and having the sort output feed directly to the second "map".

    There must be a tutorial somewhere that already presents it in those terms, and if so, people who say "use a Schwartzian Transform" should, by default, provide a link to that tutorial.

Re: To use a module...or not.
by Aristotle (Chancellor) on Jul 27, 2004 at 16:43 UTC

    How would you simplify/clarify the coding of ST?

    By not using an ST in the first place. In practice I prefer something like
    my @array = @array[ do { my @rank = map &sortkey_calc @array; sort { $rank[$a] <=> $rank[$b] } 0 .. $#array; } ];
    It's also generally faster than an ST and degrades more gracefully, too.

    Makeshifts last the longest.

      By all means, go ahead and code that for the original problem :) It would be an interesting comparison.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

        It seems like that problem is needlessly being complicated by conflating key extraction and sorting. (This was an issue that was discussed on perl6-language about the current sort operator — how do we make it possible to isolate the extraction code from the actual sort order, which should then be possible to request declaratively?)

        sub order_of_sorted { my ( @num, @suff ); for( @_ ) { m/ \A (\d+) \. (.*) /x or die "malformed data"; push @num, $1; push @suff, $2; } sort { $num[$a] <=> $num[$b] or $suff[$a] cmp $suff[$b] } 0 .. $#_ +; } for my $outer ( ( values %hash )[ order_of_sorted keys %hash ] ) { for my $inner ( ( values %$outer )[ order_of_sorted keys %$outer ] + ) { # ... } }

        If you want to store the data rather than output it, you can transform the nested for-loops to an equivalent chain of maps (but note that that doesn't make it an ST).

        I know which version I'd prefer to maintain, even though this is probably slower than your GRT.

        Update: the following, which I originally wrote, is obviously wrong:

        for my $inner ( ( values %$outer )[ order_of_sorted keys %$inner ] +) { # ^^^^^^^ Correct. ^^^^^^^ Oo +ps.

        Makeshifts last the longest.

Re: To use a module...or not.
by uri (Acolyte) on Aug 04, 2004 at 05:24 UTC
    i don't even know where to start. i would first say you should read my article on sorting in perl at sysarch.com. then read the talk slides on sort::maker at stemsystems.com/sort/slides/slides.

    when you have done that and digested it all, then you all might be able to make some cogent comments on my module and the reasons (or not) for using it.

    the ST, GRT and orcish sorts are NOT replacements for simple sorts but for complex sorts with slow key extractions. the poster with the simple sort of a hash didn't get it at all. the essence of these sort styles are caching of the extracted key. if extracting the key is easy and cheap (like when sorting a hash) then you won't save much if anything by using one of these methods. i never claimed the module was meant to be faster for them (and if you know annything about algorithm theory and O() notation you will get it), they also only improve things when the input set grows to a certain size and the key extraction is slow (-M is a great example).

    finally, if the ST is easy for you to write and understand, that is great. try groking and using the GRT (which beats the ST in almost all speed tests) as easily. the beauty of sort::maker is that you can choose the sort key caching style and NOT change anything but a single keyword. sometimes one caching style is best and you can't always predict it without proper benchmarks. and sort::maker includes a very flexible table driven benchmark/test that you can use on your own data. getting back to the readability point, of course sort::maker doesn't lessen the amount of information needed to specify a sort but it does make it more readable for the vast majority. if you don't care then don't use it as it isn't for you (you get no soup either! :). i have already gotten plenty of good feedback (including a report of a 50% speed up and some bug fixes) so i am satisfied i did the job i set out to do (if only 5 years later).

    enough for now. read the paper and slide show before you respond to this. i will not read any posts from those who haven't as it is a waste of my time.

    uri (the G in GRT).

      i don't even know where to start. i would first say you should read my article on sorting in perl at sysarch.com. then read the talk slides on sort::maker at stemsystems.com/sort/slides/slides.

      I had already read and understood A Fresh Look at Efficient Perl Sorting long ago. I both use and advocate using the GRT for most sorts where the extraction function is expensive and the dataset larger than a few dozen.

      I have now read the slides. Whilst they were fine, I don't think that I saw anything that I hadn't already read in the POD.

      Perhaps you would extend me the same courtesy by (re)reading my meditation, including following a few of the links provided.

      when you have done that and digested it all, then you all might be able to make some cogent comments on my module and the reasons (or not) for using it.

      Sorry that you thought that my comments were not cogent.

      the ST, GRT and orcish sorts are NOT replacements for simple sorts but for complex sorts with slow key extractions. the poster with the simple sort of a hash didn't get it at all. the essence of these sort styles are caching of the extracted key. if extracting the key is easy and cheap (like when sorting a hash) then you won't save much if anything by using one of these methods. i never claimed the module was meant to be faster for them (and if you know annything about algorithm theory and O() notation you will get it), they also only improve things when the input set grows to a certain size and the key extraction is slow (-M is a great example).

      Assuming you have re-read my post, you will see

      1. The poster's sort was far from a "simple sort of a hash".
      2. You will also have seen that I produced both ST and GRT versions of the required sort of nested hashes with compound keys.
      3. That I never claimed that you had "claimed that the module was faster", nor did I say, suggest nor imply that it was slower. The issue of speed was simply not a factor in my meditation.
      4. That I do "get it".

      You may also notice that I had nothing bad to say about your module--I'll repeat that. I had nothing bad to say about your module.

      Hell, I'll even quote a couple of bits for you.

      • It's an impeccably pedigreed module; cleanly coded; well documented etc.
      • There is nothing wrong with the module, not it's interface design, nor it's documentation.
      finally, if the ST is easy for you to write and understand, that is great. try groking and using the GRT (which beats the ST in almost all speed tests) as easily.

      Answered I think.

      (I could give a list if references here where I have advocated and demonstrated the use of the GRT over the ST, and produced benchmarks to substantiate my conclusions...but you SNIP.)

      the beauty of sort::maker is that you can choose the sort key caching style and NOT change anything but a single keyword. sometimes one caching style is best and you can't always predict it without proper benchmarks. and sort::maker includes a very flexible table driven benchmark/test that you can use on your own data.

      And I acknowledge all of that.

      getting back to the readability point, of course sort::maker doesn't lessen the amount of information needed to specify a sort but it does make it more readable for the vast majority.

      This is the ONLY area where I had a problem.

      I (personally, IMO, which carries no more weight than any reader chooses to give it. etc. ) didn't find the descriptive qualities of the module's use to be preferable to reading the coded sort itself.

      Maybe that's only because I am--reluctant to use the term expert, so shall we say--sufficiently familiar with both perl code, and the standard methods of construction for STs and GRTs, that I prefer to read the code.

      And that was the purpose of my meditation!

      To ask the question, was I alone in finding it easier to read and write the code as a single coherent entity, than to break the code up into little bits and try visualise how they come together...?

      I mentioned that I also have a similar problem with the callback interfaces presented by various HTML::* modules.

      (For me) this extends to many of the XML::* modules, tye's Algorithm::Loops loops() function, File::Find(::*) and many others.

      if you don't care then don't use it as it isn't for you (you get no soup either! :). i have already gotten plenty of good feedback (including a report of a 50% speed up and some bug fixes) so i am satisfied i did the job i set out to do (if only 5 years later).

      My meditation was not billed as a review or a critic of your module. Far from it.

      I actually thought that my having 'discovered it' on CPAN, and mentioning it here (I don't recall seeing it mentioned here before), might raise it's profile for those who prefer to use modules for such things. I even said as much.

      • If you like the idea of encapsulating your ST sorts (or GRTs & Orcish Manouvers), then there is an existing module that will do this for you Sort::Maker
      enough for now. read the paper and slide show before you respond to this. i will not read any posts from those who haven't as it is a waste of my time.

      I also asked the question: How would you simplify/clarify the coding of ST?

      I could not work out how to use your module to code this nested hashes with compound keys sort. I was hoping someone would show me. None did.

      I hope you don't feel this response to be as much of a waste of your time, as I feel it was a waste of mine to defend my meditation.

      uri (the G in GRT).

      BrowserUk (the '' in "nobody special" :)


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://377058]
Approved by beable
Front-paged by beable
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-03-28 20:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found