in reply to Re: Create sort function from a text file in thread Create sort function from a text file
Thank you very much for the info. I have tried the solution with eval and it works as I expect so I really appreciate it.
The reason for the requirement is as follows:
I have a web interface where users can manage their own population of STB hardware. Each STB is given a unique name (usually relating to it's position in a series of racks). On the interface, the STBs are listed along with details that relate to each unit. In Perl, the list is sorted by the most basic "sort" function.
Different users employ different naming schemes. For example one users list may look something like:
Rack1-Unit1
Rack1-Unit2
Rack3-Unit1
etc...
Wheras another user may abbreviate the names and not bother whith hyphens e,g.
R1U1
R2U3
R10U1
etc...
Users can call their STBs anything they want as long as each name is unique. The basic sort function is not ideal as when there are 10+ racks and 10+ units, Numbers in the teens are listed after number 1, instead of being listed in numeric order e,g. R1,R10,R11,R2,R3 etc...
What I want to do is allow users to build their own custom sort statements which they can store in a text file. The script that creates the list to be shown on the interface can then read that text file and sort the list according to the user.
So if a user creates the below sort statement:
{ ($a =~ /R(\d+)/)[0] <=> ($b =~ /R(\d+)/)[0] or ($a =~ /U(\d+)/)[0] <=> ($b =~ /U(\d+)/)[0] }
They will have their list ordered first by Rack, and then by unit e,g.
R1-U1
R1-U10
R2-U2
R3-U13
R10-U1
R10-U5
R10-U11
Being able to customise the sort order improves the usability of the interface for the users. They use it everyday for a majority of their work so anything that helps save time is of great value
The interface is deployed in different areas of a secure testing facility and is only used on a private VPN, so the access is tightly controlled.
Re^3: Create sort function from a text file
by haukex (Archbishop) on Aug 16, 2021 at 13:04 UTC
|
I have a web interface where users can manage their own population of STB hardware. ... What I want to do is allow users to build their own custom sort statements which they can store in a text file.
This means you'd be giving your users the power to run arbitrary Perl code under whatever user the webserver executes scripts as. So for example, if your web interface has access to a database, you're giving your users the power to access anything in that database that the web interface can, likely including other customer's records. Here you said:
I maintain all instances of the interface
One safer alternative is to give your users a predefined selection of sort orderings. Regexes such as the ones below cover all the cases you showed here. If some other user has yet another custom naming scheme that these orderings don't match, or they want some other arbitrary sort order, then it'd be fairly straightforward for you to add a new set of regexes to the %orderings hash. You would remain in control of the code that gets executed.
use warnings;
use strict;
my @examples = (
[ 'Rack1-Unit2', 'Rack3-Unit1', 'Rack1-Unit1' ],
[ 'R2U3', 'R1U4', 'R10U1', 'R1U1' ],
[ 'R2-U3', 'R1-U4', 'R10-U1', 'R1-U1' ],
);
my $rackre = qr/R(?:ack)?(\d+)/i;
my $unitre = qr/U(?:nit)?(\d+)/i;
my %orderings = (
rackfirst => sub { ($a =~ $rackre)[0] <=> ($b =~ $rackre)[0]
or ($a =~ $unitre)[0] <=> ($b =~ $unitre)[0] },
unitfirst => sub { ($a =~ $unitre)[0] <=> ($b =~ $unitre)[0]
or ($a =~ $rackre)[0] <=> ($b =~ $rackre)[0] },
);
for my $ex (@examples) {
print "Input: @$ex\n";
for my $o (sort keys %orderings) {
my @sorted = sort {&{$orderings{$o}}} @$ex;
print "$o: @sorted\n";
}
}
__END__
Input: Rack1-Unit2 Rack3-Unit1 Rack1-Unit1
rackfirst: Rack1-Unit1 Rack1-Unit2 Rack3-Unit1
unitfirst: Rack1-Unit1 Rack3-Unit1 Rack1-Unit2
Input: R2U3 R1U4 R10U1 R1U1
rackfirst: R1U1 R1U4 R2U3 R10U1
unitfirst: R1U1 R10U1 R2U3 R1U4
Input: R2-U3 R1-U4 R10-U1 R1-U1
rackfirst: R1-U1 R1-U4 R2-U3 R10-U1
unitfirst: R1-U1 R10-U1 R2-U3 R1-U4
(Note a Schwartzian transform could also be used to improve performance.) | [reply] [d/l] [select] |
|
| [reply] |
|
Ah yes, sorting with the good old Schwartzian Transform and GRT. Brings back some happy memories!
Some Schwartzian Transform references:
Wow, the Schwartzian Transform features in one of PM's earliest historic nodes on Christmas Day 1999 by chromatic,
one of the pioneer developers of the Everything Engine!
Posting on Christmas Day shows chromatic must have enjoyed tinkering with the Schwartzian Transform as much as I did in the early years. :)
Nowhere near as many references for GRT:
Still fewer for the Orcish Manoeuvre:
Most of the eleven gold stars in Illuminations (category: sorting)
were awarded to historical figures, including two gold stars to PM founder vroom himself:
See also:
External Sorting
On CPAN:
By creamygoodness (author of Sort::External):
Super-fast sorting modules by salva:
CPAN List::AllUtils sorting functions:
- sort_by
- nsort_by
- rev_sort_by
- rev_nsort_by
Other:
References Added Later
- A Fresh Look at Efficient Perl Sorting - paper by Uri Guttman (sysarch) and Larry Rossler
- YA Perl Advent Calendar by William 'N1VUX' Ricker of Boston.PM (2005)
Updated: Extra references added long after the original reply was made.
| [reply] [d/l] [select] |
|
|
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re^3: Create sort function from a text file
by Corion (Patriarch) on Aug 16, 2021 at 13:04 UTC
|
Maybe (maybe), you can get away by splitting all search terms into non-digits and digits, and sorting on that. This means that all items (in a STB population) have the same number of items, and that nobody mixes letters and numbers within the item:
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 6;
sub natural_string_cmp {
my ($left, $right) = @_;
my @left = split /\b|(?<=[A-Za-z])(?=\d)|(?<=\d)(?=[A-Za-z])/, $l
+eft;
my @right = split /\b|(?<=[A-Za-z])(?=\d)|(?<=\d)(?=[A-Za-z])/, $r
+ight;
#use Data::Dumper;
#warn Dumper \@left;
#warn Dumper \@right;
# Now, reconstruct a string for each item that can simply be compa
+red directly.
# For this, we zero-left-pad all numbers and extend all strings wi
+th \0
# For simplicity, I assume no number longer than 18 digits and no
+string longer than 10 characters
my $l = join "", map { /\d/ ? sprintf '%018d', $_ : substr($_."\0\
+0\0\0\0\0\0\0\0\0",0,10) } @left;
my $r = join "", map { /\d/ ? sprintf '%018d', $_ : substr($_."\0\
+0\0\0\0\0\0\0\0\0",0,10) } @right;
#warn $l;
#warn $r;
return $l cmp $r
}
is natural_string_cmp('Rack1-Unit1', 'Rack1-Unit1'), 0, "Identity";
is natural_string_cmp('Rack1-Unit1', 'Rack1-Unit2'), -1, "Smaller";
is natural_string_cmp('Rack1-Unit2', 'Rack3-Unit1'), -1, "Smaller";
is natural_string_cmp('Rack3-Unit1', 'Rack1-Unit1'), 1, "Larger";
is natural_string_cmp('R1U1', 'R1U1'), 0, "Identity";
is natural_string_cmp('R1U2', 'R3U1'), -1, "Smaller";
is natural_string_cmp('R3U1', 'R10U1'), -1, "Smaller";
is natural_string_cmp('R10U1', 'R3U1'), 1, "Larger";
#R1-U1
#R1-U10
#R2-U2
#R3-U13
#R10-U1
#R10-U5
#R10-U11
#
| [reply] [d/l] |
Re^3: Create sort function from a text file (string eval vs block eval)
by eyepopslikeamosquito (Archbishop) on Aug 17, 2021 at 08:04 UTC
|
> I have tried the solution with eval and it works as I expect so I really appreciate it
Hmmm, sounds like you might be about to rush off and push it into production. :)
I feel you should heed haukex's security advice and think harder about your problem
before rushing off to implement a string eval solution.
Some sound advice
from famous Perl guru merlyn aka Randal L. Schwartz:
No. Do not resort to eval-string if other means are available.
You're firing up the compiler (slower than almost any other solution), and exposing yourself to hard to debug and hard to secure practices.
and Mark Jason Dominus (MJD) from
this quiz of the week:
A good rule of thumb is that unless what you're trying to do is most clearly described
as "compile and run arbitrary Perl code", it's probably a mistake to use 'eval' to do it.
MJD also strongly advises against
using a variable as a variable name aka symbolic references ... as does Tom Christiansen
in avoid symbolic references (use a hash or a real reference instead).
Update: Confusingly, while string eval should be avoided, block eval is fine
(in fact, should be used more often than it is, based on my experiences of pointing out uncaught exceptions during code reviews ...
only to be dismayed by the perpetrator saying "Perl has exceptions? Really? I don't see a try keyword" :) ...
which illustrates the importance of choosing good names: Perl's block eval should have been spelled try.
Larry made a most unfortunate boo-boo in Perl's early days, choosing the same name (eval) for two different things,
violating the different things should look different UI principle.
Good to see he's fixed this in Raku.
References Added Later
See Also
| [reply] [d/l] [select] |
Re^3: Create sort function from a text file
by Marshall (Canon) on Aug 18, 2021 at 19:09 UTC
|
Here is another idea for you...
Rather than trying to modify each line to be compatible with a "cmp", alpha comparison, I split each line up into separate alpha and numeric tokens. cmp is used for alpha tokens and spaceship (<=>) is used for numeric values. That way, numeric 10 will sort higher than numeric 3. If for some reason, one value is "shorter" than the other and both are equal to that point, there is a "tie-breaker" so that the shortest one "wins".
The sort routine does a fair amount of 'work' to make the comparison. But with a 100 things or so, this "extra work" should make no significant performance difference. I did not make any assumptions as to the number of "parts" in each line. One example below shows a truncated line.
use strict;
use warnings;
my @list = qw( R1-U10 R1-U1 R10-U11
R10-U1
R2-U2
R3-U13
R10-U5
R2
);
@list = sort special_compare @list;
print "$_\n" for @list;
sub special_compare
{
my (@myA) = $a =~ /([a-zA-z_]+|\d+)/g;
my (@myB) = $b =~ /([a-zA-z_]+|\d+)/g;
my $result=0;
my $Atoken;
my $Btoken;
while ( defined ($Atoken = shift @myA) and defined ($Btoken = shif
+t @myB)
and $result == 0)
{
my $numeric = 0;
$numeric = 1 if ($Atoken =~ /\d/ and $Btoken =~ /\d/);
if ($numeric)
{
$result = ($Atoken <=> $Btoken);
}
else
{
$result = ($Atoken cmp $Btoken);
}
}
if ($result ==0) #if one array "runs out", longest is "greater"
{
return -1 if (@myA < @myB);
return 1 if (@myA > @myB);
}
return $result;
}
__END__
R1-U1
R1-U10
R2
R2-U2
R3-U13
R10-U1
R10-U5
R10-U11
Upon further reflection and testing, this hasty idea below didn't work (shown in readmore tags instead of being deleted). The above code is better.
Update again: As another thought, if the above generalized sort is not enough, rather than having the user's writing actual code, you could perhaps created some sort of simple grammar for the user to modify in a config file.
Perhaps:
Prototype: R10-U1
Fields: A1 N2 A3 N4
Sort Order: A3 N4 A1 N2
A for Alpha field. N for numeric field.
Those letters are really a "distinction without a difference" - mainly to keep your user's brain working correctly. What
would matter are the 1,2,3,4 numbers (essentially indices into the @tokens array). Ignore A or N and do the sort automagically according to what the
field actually is (just alpha or just numeric).
The above Order would put the units first. Rack order first would be same as field definition: A1 N2 A3 N4.
If you have some simple syntax like that in a config file, that is something that you could validate and have your special_compare()
routine use. This avoids the problem of the "user writing code" - possibly with obscure syntax errors that they may not
understand how to fix.. Not everybody understands Perl.
In some of my config files, I actually document some common scenarios as comments as a guide so the user doesn't have to actually
"RTFM".
Anyway, just a thought. Many variations on this theme are possible. My advice is to not make it more complicated than it needs to be. I'd start with
the general rack first sort above and then see how much demand there actually is for different sort orders.
| [reply] [d/l] [select] |
|
|