Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Balancing Coding Time And Code Quality

by Limbic~Region (Chancellor)
on Dec 03, 2003 at 23:47 UTC ( [id://312051]=perlmeditation: print w/replies, xml ) Need Help??

Salutations:
Today at work, I received the following request:

Quick, I need to know every file that only exists on one of two servers and which server it is on as fast as possible

I started the following command on both servers, while I worked on the code:

# find / > /tmp/`hostname`
Now since I was pretty sure the requirements and formatting would change, I didn't spend any time with a 1 liner. I am not any good at them anyway. Here is what I came up with in an honest 5 minutes:
#!/usr/bin/perl use strict; use warnings; use Getopt::Std; my %opt; Get_Args(); open (MASTER , '<' , $opt{m}) or die "Unable to open $opt{m} as master + : $!"; open (SLAVE , '<' , $opt{s}) or die "Unable to open $opt{s} as slave +: $!"; open (OUTPUT , '>' , $opt{o}) or die "Unable to open $opt{o} for outpu +t : $!"; select OUTPUT; my (%master , %slave); %slave = map {chomp; $_ => undef} <SLAVE>; while ( <MASTER> ) { chomp; print "$_ exists on master but not slave\n" if ! exists $slave{$_} +; $master{$_} = undef; } delete @slave{ keys %master }; print "$_ exists on slave but not master\n" for keys %slave; sub Get_Args { my $Usage = qq{Usage: $0 -m <master file> -s <slave file> -o <outp +ut file> -h : This help message. -m : master file -s : slave file -o : output file } . "\n"; getopts( 'hm:s:o:' , \%opt ) or die $Usage; die $Usage if $opt{h} || ! $opt{m} || ! $opt{s} || ! $opt{o}; }
I keep a template file around with the Get_Args() sub in it already, so other than tweaking it, it wasn't factor.

Ok - so it did the job. Now I am wondering if I should modify it at all and, if so, how. It obviously isn't optimized to be memory efficient, but I am not sure that it needs to be. It is simple enough that maintainability isn't a concern. I have the time to work on it if I feel like it, but I also should use my time wisely.

So as I meditate on this, I would appreciate some of your insight and feedback.

  • In my haste, did I miss any gotchas?
  • What would you have come up with in the same circumstances?
  • Assuming my circumstances and code, what would you do now?

    Cheers - L~R

  • Replies are listed 'Best First'.
    Re: Balancing Coding Time And Code Quality
    by hossman (Prior) on Dec 04, 2003 at 00:08 UTC
      What would you have come up with in the same circumstances?
      % ssh master 'find /' | sort > /tmp/master % ssh slave 'find /' | sort > /tmp/slave % diff /tmp/master /tmp/slave
        or drop the sort and diff and use comm with -1, -2, -3 to get the master unique, slave unique and unions.


        -Waswas
          or drop the sort and diff and use comm

          Agreed; comm rocks. You can't drop the sort, however, as comm expects sorted input.

          Update: I expect you know this as I just noticed you explicitly mentioned pre-sorted lists elsewhere in this thread. I suppose it is worth repeating here anyway though.

          -sauoq
          "My two cents aren't worth a dime.";
          
        hossman,
        I probably should have mentioned that I thought of that immediately. The problem is that I wasn't sure diff was perfect, and I didn't have time to find out. I figured it was possible, even sorted, for the same filename to be several hundereds of lines away from one another depending on how divergent the two servers were.

        If the administrator that made this request hasn't already taken steps to modify the two systems, I will compare the results of my script against what diff says to see. I just thought diff assumed a certain amount of similarity and gave up if the lines were too far apart.

        Cheers - L~R

        Update: After googling a bit, I found enough to convince me that diff would not have worked

          man comm

          comm -2 -3 master slave >unique_to_master comm -1 -3 master slave >unique_to_slave comm -1 -2 master slave >union_master_slave


          -Waswas

          Recursively diffing all of the acctual files on each box probably wouldn't have been a good idea, but i've never had a problem diffing sorted lists -- no matter how much difference there is. (I'm curious as to exactly what you found with google convinced you diff would have been bad)

          As many people have allready pointed out, "comm" works just as well (if not better) then diff ... i just tend to dislike the whole "tabed" output of comm ... but "comm -1" followed by "comm -2" would be great if you didn't mind having two seperate lists.

    Re: Balancing Coding Time And Code Quality
    by dragonchild (Archbishop) on Dec 04, 2003 at 01:50 UTC
      Code quality should be commensurate to how many times you can reasonably expect to have to edit it. It it's a true one-off, then don't worry about it! If it's something you might want to re-use, then put the time in up-front (if you can) and make it worthwhile.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

    Re: Balancing Coding Time And Code Quality
    by svsingh (Priest) on Dec 04, 2003 at 16:15 UTC
      Assuming my circumstances and code, what would you do now?

      For what it's worth...

      My rule of thumb is to hang onto all one-offs in a separate folder from the rest of my "production" scripts. If the same problem comes up again, then I look into cleaning it up assuming that if I need it twice, then I'll probably need it a third time (and beyond).

      I also toss in some comments with keywords that I may want to search for in the future. That helps me get through the clutter of the one-off folder. In practice, I treat the one-off folder like you do your template file. That's my guess at least.

    Re: Balancing Coding Time And Code Quality
    by Roy Johnson (Monsignor) on Dec 04, 2003 at 22:01 UTC
      The obvious (to me) modest improvement is to change the main loop so that you're only keeping in %master those things that you need to delete from %slave later (since that's all you're doing with it, anyway). For clarity, you should rename %master to %common.
      my (%common, %slave); %slave = map {chomp; $_ => undef} <SLAVE>; while ( <MASTER> ) { chomp; if (exists $slave{$_}) { $common{$_} = undef } else { print "$_ exists on master but not slave\n" } } delete @slave{ keys %common }; print "$_ exists on slave but not master\n" for keys %slave;

      The PerlMonk tr/// Advocate
        Roy Johnson,
        Thanks - your suggestion lead to getting rid of the %common/master hash all together
        my %slave = map {chomp; $_ => undef} <SLAVE>; while ( <MASTER> ) { chomp; exists $slave{$_} ? delete $slave{$_} : print "$_ exists on master + but not slave\n"; } print "$_ exists on slave but not master\n" for keys %slave;
        Cheers - L~R
    Re: Balancing Coding Time And Code Quality
    by nevyn (Monk) on Dec 04, 2003 at 20:58 UTC
      I started the following command on both servers, while I worked on the code:
      # find / > /tmp/`hostname`

      To which I reply....

      perl -e 'open(I, "> foo\nbar")'

      More fun can be had by creating paths names like "foo\n/etc/shadow"

      --
      James Antill
    Re: Balancing Coding Time And Code Quality
    by diotalevi (Canon) on Dec 04, 2003 at 04:52 UTC

      You forgot that it is very poor style to alter the input data in a map. You wrote map { chomp; $_ => undef} but you should have written map { local $_ = $_; chomp; $_ => undef }.

      Or... there is some case where the `local $_ = $_` triggers some bug so in that case you'd have to say map { my $key = $_; $key => undef } to be really safe.

        map { chomp; $_ => undef} but you should have written map { local $_ = $_; chomp; $_ => undef }.
        Really? Did the program ran incorrectly? Do you get kicked out of the Perl Programmers Pond if you alter input data in a map?

        Abigail

        Style is somewhat arbitrary. See the discussion starting at Think for yourself. and possibly you will rethink your opinion. (I did.)
          Oh yes, I've read that discussion before in other contexts. My objection was that Limbic~Region was modifying the input as an unintentional side effect. I liken it to avoiding globals and other action-at-a-distance operations. My thinking is that there is a target for the modifications to show up at and that having them show up at both the input and the ouput is unusual at best.

        Did you ever watch Ice skating or some similar sport where the scores are determined by a set of judges award marks on the basis of "style"?

        Did you ever watch an individual or couple perform a breath-taking, flamboyant, mesmerising & inspirational performance be beaten out by another that gave a technically perfect, totally synchronised, metronomic but ultimately uninspired and frankly boring performance.

        If you have, then perhaps you also wished that it was possibly award the judge(s), that ignored or even marked down the first performance, a quick 'nool pwai' and prevent them inflicting further damage :)


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!
        Wanted!

          Huh? The original code was a jot more complex than it needed to be and you think I'm nitpicking? The operation is in effect targetting both the input data and the output. I don't think that's unreasonable anyway, to expect the programmer to either modify the input or return the modification but not both simultaneously.

        it is very poor style to alter the input data in a map

        Can you explain the reason behind this to me? I have never seen $_ localized inside of a map. I don't see why it's so bad to modify $_ inside the map. Unless you're maping through a list of references in which case you'd be modifying the original structure. I must admit to never having seen localized $_ in a map and that I've seen a million scripts that mangle $_ to death in such cases.

          The Blue Camel has this to say about map on page 741:

          Because $_ is an alias (implicit reference) into the list's values, this variable can be used to modify the elements of the array.

          So you'll be modifying the orginal value, weather it was a list of references or not. Mearly chomping the value, as the code posted does, is unlikely to do real harm, so I wouldn't consider it a problem in this case.

          ----
          I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
          -- Schemer

          : () { :|:& };:

          Note: All code is untested, unless otherwise stated

    Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Node Status?
    node history
    Node Type: perlmeditation [id://312051]
    Approved by chromatic
    Front-paged by chromatic
    help
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this?Last hourOther CB clients
    Other Users?
    Others learning in the Monastery: (5)
    As of 2024-03-28 14:17 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found