http://qs321.pair.com?node_id=122757

CharlesClarkson has asked for the wisdom of the Perl Monks concerning the following question:

I have, for some time been helping a beginner with a project. After offering a couple of solutions and a pointer or 2, I came up with this rather convoluted solution (below) and another straightforward one (using Sort::Fields). As luck would have it, my friend liked the approach below better. (No, really, I do have friends.)

#!/usr/bin/perl use strict; use warnings; use diagnostics; BEGIN { my (@records, @report); my $data_file = 'test.dat'; { open my $fh, $data_file or die "Can't open $data_file: $!"; while (<$fh>) { # skip blank and commented lines next if /^\s*#/; next if /^\s*$/; chomp; # We'll look for lines that describe a report: if (/report:\s+sort\s+(\S*)/ ) { @report = split /,/, $1; } else { push @records, [split /,/]; } } } my %field = ( source => 0, time => 1, sip => 2, sport => 3, dip => 4, dport => 5, hits => 6, acl => 7, lnum => 8 ); my (@sort_sub, @sort_description); # Here is where we do the sub building foreach (@report) { my ($name, $order) = split /-/; my $cmp = $name eq 'source' ? 'cmp' : '<=>'; if ($order eq 'd') { push @sort_sub, qq|\@\$::b[$field{$name}] $cmp \@\$::a[$field{$name}]| +; push @sort_description, |$name:\t\tdescending|; } else { push @sort_sub, qq|\@\$::a[$field{$name}] $cmp \@\$::b[$field{$name}]| +; push @sort_description, |$name:\t\tascending|; } } my $sort_sub = join "\n\t\t\t || \n\t", @sort_sub; # This is the part I wish to find a better solution for # And here is where we create sort.pl { my $file_name = 'sort.pl'; open my $fh2, '>', $file_name or die "Cannot open $file_name for write: $!"; print $fh2 qq|sub column_sort {\n|, qq|\t$sort_sub;|, qq|\n}\n1;\n|; } sub records { return @records; } sub sort_description { return join "\n", @sort_description; } } require 'sort.pl'; # created with BEGIN my $date = localtime; $date =~ s/ /-/g; ### store report data in file # changed from "$date.txt" to conform to winblows my $report_file_name = 'out.txt'; open my $fh, '>', $report_file_name or die "Can't Open $report_file_name: $!"; print $fh qq|\t\tREQUEST FOR SORTING\n\n|, sort_description, "\n\n", qq|FILE WAS GENERATED ON: $date \n\n|; print $fh (join ',', @$_), "\n" for sort column_sort records; __END__

Here is the report file: test.dat

report: sort lnum-a,source-a,hits-d,sip-d,time-a Monmouth,2000-05-2000:00:09-04,192.35.75.69,138,192.100.255,66,2,105,1 +234 Jackson,2000-04-2100:00:10-05,192.35.12.03,144,192.67.29,134,8,101,148 +7 Meade,2001-01-0500:00:11-04,213.132.32,175,184.57.62.35,151,12,107,153 +2 Yuma,200-03-1100:00:12-05,210.0.0.0,156,192.78.54.21,156,10,105,1578 Monmouth,200-04-1000:00:10-08,63.0.0.0,125,192.45.67.2,159,16,101,1879 Lewis,2000-03-1200:21:45-32,167.54.80.65,144,192.67.29,134,32,107,1487 Hero,2001-01-0500:00:11-04,211.34.78.93,132,184.57.62.35,151,12,101,15 +32 Jackson,2000-01-0500:00:11-04,193.0.0.195,175,184.57.62.35,151,14,105, +153

Finally, here are the sort.pl and out.txt files.

sub column_sort { @$::a[8] <=> @$::b[8] || @$::a[0] cmp @$::b[0] || @$::b[6] <=> @$::a[6] || @$::b[2] <=> @$::a[2] || @$::a[1] <=> @$::b[1]; } 1;
REQUEST FOR SORTING lnum: ascending source: ascending hits: descending sip: descending time: ascending FILE WAS GENERATED ON: Thu-Nov--1-23:00:53-2001 Jackson,2000-01-0500:00:11-04,193.0.0.195,175,184.57.62.35,151,14,105, +153 Monmouth,2000-05-2000:00:09-04,192.35.75.69,138,192.100.255,66,2,105,1 +234 Jackson,2000-04-2100:00:10-05,192.35.12.03,144,192.67.29,134,8,101,148 +7 Lewis,2000-03-1200:21:45-32,167.54.80.65,144,192.67.29,134,32,107,1487 Hero,2001-01-0500:00:11-04,211.34.78.93,132,184.57.62.35,151,12,101,15 +32 Meade,2001-01-0500:00:11-04,213.132.32,175,184.57.62.35,151,12,107,153 +2 Yuma,200-03-1100:00:12-05,210.0.0.0,156,192.78.54.21,156,10,105,1578 Monmouth,200-04-1000:00:10-08,63.0.0.0,125,192.45.67.2,159,16,101,1879

I would really like to get rid of the extra file. I don't mind keeping the BEGIN block, but I don't want to lose or go around strict. The files are small and a large sort routine is not a big deal.




Thanks,
Charles K. Clarkson

Edit kudra, 2001-11-04 Added readmore

Replies are listed 'Best First'.
Re: How do I create a sort sub on-the-fly?
by blakem (Monsignor) on Nov 02, 2001 at 13:47 UTC
    I'm a bit hesitant to post this because I don't really like the eval at the end.... but I think it will point you in a better (or at least more interesting) direction, and hopefully others will comment on my use of eval.

    Anyway, the code below creates some simple sorting routines at runtime....

    #!/usr/bin/perl -wT use strict; my @arr = (6..10,1..5); my $asc = sortmaker(1); my $desc = sortmaker(0); my @asc = sort $asc @arr; my @desc = sort $desc @arr; print "Original: @arr\n"; print "Ascending: @asc\n"; print "Descending: @desc\n"; sub sortmaker { my $order = shift; my $clause = $order ? '$a <=> $b' : '$b <=> $a'; eval "sub { $clause }"; } =OUTPUT Original: 6 7 8 9 10 1 2 3 4 5 Ascending: 1 2 3 4 5 6 7 8 9 10 Descending: 10 9 8 7 6 5 4 3 2 1

    -Blake

      No need to be ashamed of the eval, it's a standard technique in happy programs (who said "programs that generate other programs are the happiest programs of all"?).

      For a simple case like ascending/descending you can just generate the coderef at once:

      sub sortmaker { my $order = shift; my $sub = $order ? sub { $a <=> $b } : sub {$b <=> $a}; return $sub; }

      but if you need to build the subroutine then eval is the way to go. I would write it a little more verbosely though:

      sub sortmaker { my $order = shift; my $clause = $order ? '$a <=> $b' : '$b <=> $a'; my $sub= eval "sub { $clause}"; # create the coderef return $sub; # explicitely return it }
        I recommend against using a string eval without checking $@ and putting in an error check showing both the error and the generated code that produced that error. Else you could easily have a small typo and never know it, leading to much puzzling about why your sort worked strangely.

        But luckily, even for complex sorts, you don't need eval. Instead for each specific clause you can build a function that takes that clause and produces a sort function from it. If you have multiple clauses, then you can combine them easily. Just walk through your clauses, building up an array of comparisons you want to do, and then call something like this:

        # Takes a list of comparison subroutines for a sort, and # returns a combined comparison subroutine. sub combine_comp_subs { my @subs = @_; if (1 == @subs) { return shift @subs; } else { return sub { foreach my $sub (@subs) { my $c = $sub->(); return $c if $c; } return 0; }; } }
        And now the main problem you have is figuring out how to set up your data structure to compare, and how to turn the individual conditions into individual sort subs. But you have the same problem with an eval solution...

        Thanks to blakem and mirod my code looks like this.

        #!/usr/bin/perl use strict; use diagnostics; use Time::Local; my (@records, @report); { my $data_file = 'test.dat'; open my $fh, $data_file or die "Can't open $data_file: $!"; while (<$fh>) { # skip blank and commented lines next if /^\s*#/; next if /^\s*$/; chomp; # We'll look for lines that describe a report: if (/report:\s+sort\s+(\S*)/ ) { @report = split /,/, $1; } else { push @records, [split /,/]; } } } my %field = ( source => 0, time => 1, sip => 2, sport => 3, dip => 4, dport => 5, hits => 6, acl => 7, lnum => 8 ); my (@sort_sub, @sort_description); # Here is where we do the sub building foreach (@report) { my ($name, $order) = split /-/; my $cmp = $name =~ /source|time/ ? 'cmp' : '<=>'; if ($order eq 'd') { push @sort_sub, qq|\@\$::b[$field{$name}] $cmp \@\$::a[$field{$name}]|; push @sort_description, qq|$name:\t\tdescending|; } else { push @sort_sub, qq|\@\$::a[$field{$name}] $cmp \@\$::b[$field{$name}]|; push @sort_description, qq|$name:\t\tascending|; } } # require 'sort.pl'; # created with begin my $date = localtime; $date =~ s/ /-/g; ### store report data in file my $report_file_name = 'out.txt'; open my $fh, '>', $report_file_name or die "Can't Open $report_file_name: $!"; print $fh qq|\t\tREQUEST FOR SORTING\n\n|, sort_description(), "\n\n", qq|FILE WAS GENERATED ON: $date \n\n|; my $sort = column_sort(\@sort_sub); print $fh (join ',', @$_), "\n" for sort $sort @records; sub column_sort { my $ref = shift; my $sort_sub = join " || ", @$ref; return eval "sub { $sort_sub }"; } sub sort_description { return join "\n", @sort_description; } __END__



        Thanks for the help,
        Charles K. Clarkson
Re: How do I create a sort sub on-the-fly?
by jeroenes (Priest) on Nov 02, 2001 at 15:15 UTC
    I would avoid writing such elaborate sorting subs and use DBD::RAM. This is the first timer for me, BTW.

    This is yet untested tested now:

    use DBI; use SuperSplit; my $data; { local $/ = undef; $data = <DATA>; } $data=~tr/\-/,/; my $dbh = DBI->connect('DBI:RAM:','usr','pwd',{RaiseError=>1}); $dbh->func({ table_name => 'file', col_names => 'source,year,month,time,sip,sport,dip,dport,hits, +acl,mlnum', data_type => 'CSV', data_source => $data, }, 'import' ); my $ar = $dbh->selectall_arrayref(qq[SELECT * FROM file ORDER by spo +rt asc, time desc ]); print superjoin(',',"\n", $ar ); __DATA__ Monmouth,2000-05-2000:00:09-04,192.35.75.69,138,192.100.255,66,2,105,1 +234 Jackson,2000-04-2100:00:10-05,192.35.12.03,144,192.67.29,134,8,101,148 +7 Meade,2001-01-0500:00:11-04,213.132.32,175,184.57.62.35,151,12,107,153 +2 Yuma,200-03-1100:00:12-05,210.0.0.0,156,192.78.54.21,156,10,105,1578 Monmouth,200-04-1000:00:10-08,63.0.0.0,125,192.45.67.2,159,16,101,1879 Lewis,2000-03-1200:21:45-32,167.54.80.65,144,192.67.29,134,32,107,1487 Hero,2001-01-0500:00:11-04,211.34.78.93,132,184.57.62.35,151,12,101,15 +32 Jackson,2000-01-0500:00:11-04,193.0.0.195,175,184.57.62.35,151,14,105, +153
    Of course you will have to write your own sorts here, but that is fairly easy now, just adjust the SQL ORDER string.

    HTH,

    Jeroen
    "We are not alone"(FZ)

      Perhaps you mean DBD::AnyData? DBD::RAM is obsoleted by this one.

      Anyway, you've showed an interesting way to treat the problem as DBI query! ++ for that! (DBD::CSV should do it, too)

        I had no idea that DBD:RAM was obsoleted. I remembered some merlyn-WT with it, so I just installed it to try. Will check out DBD::Anydata of course... thanks.

        Update I got back at my root 'konsole' and found that anaydata failed the tests.. is that a known bug? ... i tried and tried... AnayData actually hangs on the.... I also tried cpan://Anydata... to no avail.

        I think DBD::RAM uses DBD::CSV internally.

      I originally used Sort::Fields to create the report, but as I mention below most beginners don't like to use modules outside the standard ones.

      #!/usr/bin/perl use strict; use diagnostics; use Sort::Fields; my (@records, @report); { my $data_file = 'test.dat'; open my $fh, $data_file or die "Can't open $data_file: $!"; while (<$fh>) { # skip blank and commented lines next if /^\s*#/; next if /^\s*$/; # We'll look for lines that describe a report: if (/report:\s+sort\s+(\S*)/ ) { @report = split /,/, $1; next; } push @records, $_; } } my (@columns, @sort_description); { my %field = ( source => 1, time => 2, sip => 3, sport => 4, dip => 5, dport => 6, hits => 7, acl => 8, lnum => 9 ); foreach (@report) { my ($name, $order) = split /-/; my $suffix = $name =~ /source|time/ ? '' : 'n'; if ($order eq 'd') { push @sort_description, "$name:\t\tdescending\n"; push @columns, "-$field{$name}$suffix"; } else { push @sort_description, "$name:\t\tascending\n"; push @columns, "$field{$name}$suffix"; } } } { ### store report data in file my $report_file_name = "out.txt"; open my $fh, '>', $report_file_name or die "Can't Open $report_file_name: $!"; my $date = localtime; $date =~ s/ /-/g; print $fh qq|\t\tREQUEST FOR SORTING\n\n|, @sort_description, qq|\n\nFILE WAS GENERATED ON: $date\n\n|, fieldsort ',', \@columns, @records; } __END__



      Thanks for your reply,
      Charles K. Clarkson
Re: How do I create a sort sub on-the-fly?
by japhy (Canon) on Nov 02, 2001 at 20:02 UTC
    Just so you know, require() doesn't happen until run-time, so I don't see the need for the BEGIN block. Anyway, here's how I would construct a sort function... maybe I love regexes too much?
    # assume my(%field) exists, holding name->index pairs # assume my(@records) exists, holding your recods my $data = "lnum-a,source-a,hits-d,sip-d,time-a"; my %is_string = ( source => 1 ); # sort with cmp my $sorter = make_field_sort($data, \%field, \%is_string); for (sort $sorter @records) { ... } sub make_field_sort { my ($data, $FIELD, $STRING) = @_; my @cmp; for (split /,/, $data) { my ($f, $o) = split /-/; push @cmp, ('b <=> a', 'a <=> b')[$o eq 'a']; $cmp[-1] =~ s/<=>/cmp/ if $STRING->{$f}; $cmp[-1] =~ s/([ab])/\$${1}->[$FIELD->{$f}]/g; } local $" = " or "; eval "sub { @cmp }"; }

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: How do I create a sort sub on-the-fly?
by toma (Vicar) on Nov 02, 2001 at 21:14 UTC
    You don't need to write your own sort for this one. Data::Table does this for you.
    #!/usr/bin/perl use strict; use Data::Table; my $t = Data::Table::fromCSV("data.csv", 0); # No header $t->sort(8,0,0, 0,1,0, 4,0,1, 2,0,1, 1,0,0); print $t->csv;
    The parameters on the sort specify the column, sort direction, and numeric versus ascii sorting.

    This produces output that agrees with yours.

    It should work perfectly the first time! - toma

      Yes, I wrote one using Sort::Fields that was really short and quick. It seems that most new perl programmers don't like to use modules. Given a choice (and my student here had a choice), most neophytes will opt for the approach that doesn't involve installing a new module. Even after I explain that most modules don't require permission from a sys admin to install.




      Thanks for your reply,
      Charles K. Clarkson