g_speran has asked for the wisdom of the Perl Monks concerning the following question:
Apologies in advance for this senior moment.
I am looking to compare a variable, which contains a number to a list of numbers, such as 1,2,5,6,9,10,41-56
If the value is any of those numbers then don’t perform my “if” routine
next If ($x = [number above] );
Re: Comparing a value to a list of numbers
by davido (Cardinal) on Jan 29, 2021 at 17:29 UTC
|
The hashtable lookup is by far the fastest approach if you can spare the time to build it in the first place. You wouldn't use the hash approach if you are only doing a search one time on the data set. But if you are doing it many times, the hash approach quickly shines.
Other viable approaches include regex alternation, any, and grep. Observe:
#!/usr/bin/env perl
use strict;
use warnings;
use List::Util qw(any);
use Benchmark qw(cmpthese);
my @numlist = map {int(rand(100_000))} 1..10_000;
my %numlist_table;
@numlist_table{@numlist} = ();
my $numlist_re = do {
my $numlist_str = join('|', @numlist);
qr/^(?:$numlist_str)$/;
};
my $target = 42;
sub sgrep {
return grep {$target == $_} @numlist;
}
sub table {
return exists $numlist_table{$target};
}
sub alternation {
return $target =~ m/$numlist_re/;
}
sub sany {
return any {$target == $_} @numlist;
}
cmpthese(-3, {
grep => \&sgrep,
table => \&table,
alternation => \&alternation,
any => \&sany,
});
The results:
Rate grep any alternation table
grep 4055/s -- -28% -100% -100%
any 5641/s 39% -- -100% -100%
alternation 3440637/s 84745% 60895% -- -93%
table 48177809/s 1187946% 853994% 1300% --
However, if you consider the time it takes to build the alternation regex, and to build the hash, they are less efficient options for small data sets, or for small numbers of lookups. For large datasets with large numbers of lookups, the hash is a good answer (unless memory is tight).
| [reply] [d/l] [select] |
|
Well, there is another question lurking behind, which is IMHO not too tangential
What if large ranges are involved? The OP showed at least a little one with 41-56.
I'm sure a binary search would outperform a hash then.°
(Of course not if using random keys like you did)
update
°) Well I need to be more precise: Large ranges are sometimes not practicable with hashes because of the memory complexity. This could lead to heavy swapping operations.
Having a little slower code with binary search O(ln) is often better than hash lookup with O(1)
| [reply] |
|
Hi Dave,
>
my $target = 42;
I think using a fixed number if some routines use a linear search might not lead to accurate results.
But I have to admit, I don't know yet how to best combine a random $target with Benchmark
| [reply] [d/l] |
Re: Comparing a value to a list of numbers
by Discipulus (Canon) on Jan 29, 2021 at 16:19 UTC
|
Hello g_speran,
you can use grep like in if grep{ $given == $_ } @nums or any from List::Util
Anyway what you posted is not perl: the equal sign = is an assignement operator and you need == to compare nums and eq for strings. if must be lowercase and square brackets are for anonymous arrays.
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
| [reply] [d/l] [select] |
Re: Comparing a value to a list of numbers
by haukex (Archbishop) on Jan 29, 2021 at 18:47 UTC
|
a variable, which contains a number to a list of numbers, such as 1,2,5,6,9,10,41-56
I'm a little unclear on what you mean. Is this variable an array, like the other monks have assumed? Because if it's a string, you'll have to parse it first. Luckily, there is a module that can handle a string such as the one you've shown, Set::IntSpan. The advantage of this module is that if the spans are large, like say 1,2,3-100000, it'll save a lot of memory as compared to an array or a hash. Otherwise, this is just a TIMTOWTDI solution.
use warnings;
use strict;
use Set::IntSpan;
my $set = Set::IntSpan->new("1,2,5,6,9,10,41-56");
for my $x (1,2,3,42,100) {
print "$x is ", $set->member($x)?"":"NOT ", "in the set\n";
}
__END__
1 is in the set
2 is in the set
3 is NOT in the set
42 is in the set
100 is NOT in the set
| [reply] [d/l] |
|
| [reply] |
Re: Comparing a value to a list of numbers
by hippo (Bishop) on Jan 29, 2021 at 16:59 UTC
|
| [reply] |
Re: Comparing a value to a list of numbers
by Fletch (Bishop) on Jan 29, 2021 at 16:58 UTC
|
use Quantum::Superpositions;
if ($x == any($a, $b, $c)) { ... }
The cake is a lie.
The cake is a lie.
The cake is a lie.
| [reply] [d/l] |
|
IIRC this is implemented by any returning an object overloading the == operator. And there is maybe also tie'ing involved.
Hence many movable and slow parts. Essentially a very sweet syntax coming with a cost.
Question: is there also intelligent support for ranges? I'd buy into this if yes.
like if ($x == any(1,2,5,6,9,10,[41,56]) { ... } ?
I suppose it's clear where the advantages are ...
| [reply] [d/l] [select] |
|
| [reply] |
Re: Comparing a value to a list of numbers (in via hash)
by LanX (Saint) on Jan 29, 2021 at 16:52 UTC
|
TIMTOWTDI
Depending on your requirements on time and memory complexity, you could also use a hash
DB<28> %in = map { $_ => 1 } 1,2,5,6,9,10,41..56
DB<29> p $in{5}
1
DB<30> p $in{0}
DB<31>
update
corrected C&P error omitting the 1
| [reply] [d/l] [select] |
Re: Comparing a value to a list of numbers
by jcb (Parson) on Jan 31, 2021 at 04:50 UTC
|
Other monks have suggested a binary search solution, but did not provide an example. I took writing an optimized binary search for this as a challenge and wrote this:
#!/usr/bin/perl
# -*- CPerl -*-
use strict;
use warnings;
# sparse range structure:
# array of arrayref/number
# single integers represented as number
# contigous ranges as arrayref [LOW,HIGH] inclusive
# given: string "A,B,C,X-Y,Z"
# return: sorted array [A, B, C, [X, Y], Z]
sub parse ($) {
my @elements = split /,/, shift;
foreach (@elements) {
s/\s+//g; # prune whitespace
next unless m/\d+-\d+/; # skip conversion if single integer
$_ = [split /-/, $_]; # convert ranges to arrayrefs
}
# sort range set
@elements = sort {(ref($a)?$a->[0]:$a) <=> (ref($b)?$b->[0]:$b)} @el
+ements;
# merge overlapping loose elements into preceding ranges
for (my $i = 0; $i < $#elements; $i++) {
next unless ref $elements[$i]; # skip single integers
while ($i+1 <= $#elements and $elements[$i+1] <= $elements[$i][1])
{ splice @elements, $i+1, 1 } # remove elements included in r
+ange
}
# coalesce contiguous integers into ranges
for (my $i = 0; $i < $#elements; $i++) {
next if ref $elements[$i]; # skip ranges
if ($elements[$i]+1 == $elements[$i+1]) {
my $j = 1+$i;
$j++ while !ref($elements[$j]) && $elements[$j]+1 == $elements[$
+j+1];
splice @elements, $i, 1+$j-$i, [$elements[$i], $elements[$j]];
}
}
# merge adjacent loose elements into succeeding ranges
for (my $i = 0; $i < $#elements; $i++) {
next if ref $elements[$i]; # skip ranges
next unless ref $elements[$i+1]; # but next element is a range
# There can be at most one such element, since contiguous integers
+ were
# coalesced into ranges above.
if ($elements[$i]+1 == $elements[$i+1][0])
{ splice @elements, $i, 2, [$elements[$i], $elements[$i+1][1]] }
}
# merge adjacent loose elements into preceding ranges
for (my $i = 0; $i < $#elements; $i++) {
next unless ref $elements[$i]; # skip single integers
next if ref $elements[$i+1]; # but next element is a single int
+eger
# There can be at most one such element, since contiguous integers
+ were
# coalesced into ranges above.
if ($elements[$i][1]+1 == $elements[$i+1])
{ splice @elements, $i, 2, [$elements[$i][0], $elements[$i+1]] }
}
# merge overlapping ranges
for (my $i = 0; $i < $#elements; $i++) {
next unless ref $elements[$i] and ref $elements[$i+1];
splice @elements, $i, 2, [$elements[$i][0], $elements[$i+1][1]]
if $elements[$i][1] >= $elements[$i+1][0];
}
# merge adjacent ranges
for (my $i = 0; $i < $#elements; $i++) {
next unless ref $elements[$i] and ref $elements[$i+1];
splice @elements, $i, 2, [$elements[$i][0], $elements[$i+1][1]]
if $elements[$i][1]+1 == $elements[$i+1][0];
}
return \@elements;
}
# given: sorted array from sub parse and integer
# return true if the integer is in the sorted array
sub search ($$) {
my $set = shift;
my $num = shift;
my $left = 0; my $right = $#$set;
my $i = $#$set >> 1; # bitshift for integer /2
while ($left < $right) {
if (ref($set->[$i])) { # evaluate a range
return 1 # number within this range
if $num >= $set->[$i][0] && $num <= $set->[$i][1];
if ($num > $set->[$i][0]) { $left = $i+1 }
else { $right = $i-1 }
} else { # evaluate a single integer
return 1 # number matched
if $num == $set->[$i];
if ($num > $set->[$i]) { $left = $i+1 }
else { $right = $i-1 }
}
$i = ($left + $right) >> 1; # bitshift for integer /2
}
# last check
if (ref($set->[$i])) {
return $num >= $set->[$i][0] && $num <= $set->[$i][1]
} else {
return $num == $set->[$i]
}
}
my $Set = parse shift;
use Data::Dumper;
print Data::Dumper->new([$Set],[qw(Set)])->Indent(0)->Dump,"\n";
foreach (@ARGV) {
print $_, search($Set, $_) ? ' is' : ' is not', " in the set\n";
}
The script expects a set of numbers as its first command line argument, and then various numbers to test against the set as additional arguments. Examples:
$ ./bsearch.pl 1,2,5,6,9,10,41-56 1 4 42 17
$Set = [[1,2],[5,6],[9,10],[41,'56']];
1 is in the set
4 is not in the set
42 is in the set
17 is not in the set
$ ./bsearch.pl 1,2,11-16,6,7,19,9,5-8,13,14,15,4 1 2 3 4 5 8 9 10 11 1
+2 16 17 18 19 20
$Set = [[1,2],[4,9],[11,16],19];
1 is in the set
2 is in the set
3 is not in the set
4 is in the set
5 is in the set
8 is in the set
9 is in the set
10 is not in the set
11 is in the set
12 is in the set
16 is in the set
17 is not in the set
18 is not in the set
19 is in the set
20 is not in the set
This latter example was used for developing the set-optimization logic. | [reply] [d/l] [select] |
|
| [reply] |
|
| [reply] |
|
|
|
|
DB<47> sub tst { if ($_>=9) { if ($_>=41 and $_<=56) {1} elsif ($_<=
+10) {1} } elsif ($_<=6) { if ($_>=5){1} elsif \
($_<=2 and $_>=1) {1} }}
DB<48> %in = map { $_ => 1 } 1,2,5,6,9,10,41..56
DB<49> tst != $in{$_} and print "error $_" for 1..190
DB<50>
I was also pondering about how to optimize a solution (time and memory) for all cases.
Like potentially
- many points
- many ranges
- floats instead of integers
- multiple dimensions (like rectangular areas in a plane)
An "optimal" solution would involve combining different techniques ...
And some techniques which were prominent in similar older threads haven't been mentioned yet.
Anyway all of this is far beyond what the OP wanted.
But yes: again "I'm suggesting but not providing an example" ;-)
| [reply] [d/l] |
|
I assumed that the set would vary (although using eval to convert the arrayref returned from parse into a closure like your tst could be interesting even if compile could end up longer than parse) and tried to optimize for the "many points" and "many ranges" cases; thus all those extra passes in the parse sub to simplify the result. I excluded floating point because exact comparisons are not reliable and it is easier to restrict the domain to integers. Multiple dimensions start to get complex enough that I suspect a general optimal solution is impossible — optimizing that case will depend on the actual data used.
The "suggesting but not providing an example" remark was an excuse for posting my solution — usually these questions have been "done to death" before I get to them. ;-)
| [reply] [d/l] [select] |
|
Re: Comparing a value to a list of numbers (in via smartmatch)
by LanX (Saint) on Jan 29, 2021 at 16:47 UTC
|
DB<24> $x=5
DB<25> p $x ~~ [1,2,5,6,9,10,41..56]
1
DB<26> $x=0
DB<27> p $x ~~ [1,2,5,6,9,10,41..56]
DB<28>
but
WARNING: Smartmatch is experimental
so you'd need to deactivate the warning and hope it'll never vanish (at your risk)
| [reply] [d/l] [select] |
|
I personally would not use the smartmatch operator for anything.
I did some experimenting with this experimental operator and found out that it is "too smart for me!". I could not accurately predict what this thing would actually do!
| [reply] |
|
| [reply] |
Re: Comparing a value to a list of numbers
by tybalt89 (Monsignor) on Feb 01, 2021 at 02:32 UTC
|
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11127703
use warnings;
no warnings qw( qw );
use Search::Dict;
@ARGV = qw( 1,2,5,6,9,10,41-56 1 4 42 17 );
my $file = join '', map "$_\n", split /,/, shift;
open my $fh, '<', \$file or die;
for my $n ( @ARGV )
{
look $fh, $n, {
comp => sub { $_[0] <=> $_[1] },
xfrm => sub { $_[0] =~ s/\d+-//r },
};
my $have = <$fh>;
my $not = (defined $have and
$have =~ /(\d+)-(\d+)/ ? $1 <= $n <= $2 : $have == $n) ? '' : ' no
+t';
print "$n is$not in set\n";
}
Outputs:
1 is in set
4 is not in set
42 is in set
17 is not in set
| [reply] [d/l] [select] |
Re: Comparing a value to a list of numbers
by Polyglot (Chaplain) on Feb 02, 2021 at 03:34 UTC
|
The variety of answers you've already gotten just shows how broad the TMTOWTDI principle actually is in Perl. I would do something entirely different than has yet been suggested--might not be the best, but it has worked for me in similar situations a number of times.
my $check = sprintf("%0.2d", $x);
next if ('01,02,05,06,09,10,41,42,43,44,45,46,47,48,49,50,51,52,53,54,
+55,56' =~ m/$check/);
Obviously, if the list gets too long, this might not be practical. As written, this accommodates one- or two-digit numbers (though this could be tailored for larger ones, with the addition of more leading zeros). But for a list such as you suggested, it might work well enough--better than taking time to hash it all in terms of its simplicity, if not its efficiency.
| [reply] [d/l] |
Re: Comparing a value to a list of numbers
by Anonymous Monk on Jan 29, 2021 at 18:51 UTC
|
Seriously ... List::Util is your friend. (Even if you just "steal" a particular algorithm from it.) There's no reason to do something over if someone else has already done it. | [reply] |
|
Please give an example (in perl code) of hose List::Util can be used to solve the OP's problem.
Otherwise your "help" is just a waste of space and a waste of everybody's time.
| [reply] |
|
Please give an example (in perl code) of hose List::Util can be used to solve the OP's problem
use warnings;
use strict;
use List::Util 'uniq';
my @given = (1,2,5,6,9,10,41..56);
my @check = (3,4,9,11,48,102);
my $c = uniq(@given);
for(@check) {
push(@given, $_);
print "$_ is in the list\n"
unless $c + 1 == uniq(@given);
pop(@given);
}
__END__
Outputs:
9 is in the list
48 is in the list
List::Util::uniq() utilizes the hash lookup referred to elsewhere in this thread.
Some efficiency is lost in this script because uniq() has to generate the hash every time it is called.
It would be more efficient if the perl code itself created the hash (just once), and then re-used that hash for each number that needs to be checked.
Perhaps anonymous had some other List::Util routine in mind.
Cheers, Rob
| [reply] [d/l] |
|
|
| A reply falls below the community's threshold of quality. You may see it by logging in.
|
|
|