Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Comparing a value to a list of numbers

by g_speran (Scribe)
on Jan 29, 2021 at 15:58 UTC ( [id://11127640]=perlquestion: print w/replies, xml ) Need Help??

g_speran has asked for the wisdom of the Perl Monks concerning the following question:

Apologies in advance for this senior moment. I am looking to compare a variable, which contains a number to a list of numbers, such as 1,2,5,6,9,10,41-56 If the value is any of those numbers then don’t perform my “if” routine
next If ($x = [number above] );

Replies are listed 'Best First'.
Re: Comparing a value to a list of numbers
by davido (Cardinal) on Jan 29, 2021 at 17:29 UTC

    The hashtable lookup is by far the fastest approach if you can spare the time to build it in the first place. You wouldn't use the hash approach if you are only doing a search one time on the data set. But if you are doing it many times, the hash approach quickly shines.

    Other viable approaches include regex alternation, any, and grep. Observe:

    #!/usr/bin/env perl use strict; use warnings; use List::Util qw(any); use Benchmark qw(cmpthese); my @numlist = map {int(rand(100_000))} 1..10_000; my %numlist_table; @numlist_table{@numlist} = (); my $numlist_re = do { my $numlist_str = join('|', @numlist); qr/^(?:$numlist_str)$/; }; my $target = 42; sub sgrep { return grep {$target == $_} @numlist; } sub table { return exists $numlist_table{$target}; } sub alternation { return $target =~ m/$numlist_re/; } sub sany { return any {$target == $_} @numlist; } cmpthese(-3, { grep => \&sgrep, table => \&table, alternation => \&alternation, any => \&sany, });

    The results:

    Rate grep any alternation table grep 4055/s -- -28% -100% -100% any 5641/s 39% -- -100% -100% alternation 3440637/s 84745% 60895% -- -93% table 48177809/s 1187946% 853994% 1300% --

    However, if you consider the time it takes to build the alternation regex, and to build the hash, they are less efficient options for small data sets, or for small numbers of lookups. For large datasets with large numbers of lookups, the hash is a good answer (unless memory is tight).


    Dave

      Well, there is another question lurking behind, which is IMHO not too tangential

      What if large ranges are involved? The OP showed at least a little one with 41-56.

      I'm sure a binary search would outperform a hash then.°

      (Of course not if using random keys like you did)

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

      update

      °) Well I need to be more precise: Large ranges are sometimes not practicable with hashes because of the memory complexity. This could lead to heavy swapping operations.

      Having a little slower code with binary search O(ln) is often better than hash lookup with O(1)

      Hi Dave,

      > my $target = 42;

      I think using a fixed number if some routines use a linear search might not lead to accurate results.

      But I have to admit, I don't know yet how to best combine a random $target with Benchmark

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Re: Comparing a value to a list of numbers
by Discipulus (Canon) on Jan 29, 2021 at 16:19 UTC
    Hello g_speran,

    you can use grep like in if grep{ $given == $_ } @nums or any from List::Util

    Anyway what you posted is not perl: the equal sign = is an assignement operator and you need == to compare nums and eq for strings. if must be lowercase and square brackets are for anonymous arrays.

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Comparing a value to a list of numbers
by haukex (Archbishop) on Jan 29, 2021 at 18:47 UTC
    a variable, which contains a number to a list of numbers, such as 1,2,5,6,9,10,41-56

    I'm a little unclear on what you mean. Is this variable an array, like the other monks have assumed? Because if it's a string, you'll have to parse it first. Luckily, there is a module that can handle a string such as the one you've shown, Set::IntSpan. The advantage of this module is that if the spans are large, like say 1,2,3-100000, it'll save a lot of memory as compared to an array or a hash. Otherwise, this is just a TIMTOWTDI solution.

    use warnings; use strict; use Set::IntSpan; my $set = Set::IntSpan->new("1,2,5,6,9,10,41-56"); for my $x (1,2,3,42,100) { print "$x is ", $set->member($x)?"":"NOT ", "in the set\n"; } __END__ 1 is in the set 2 is in the set 3 is NOT in the set 42 is in the set 100 is NOT in the set
      Luckily, there is a module that can handle a string such as the one you've shown, Set::IntSpan

      I'm not sure why I continue to be surprised...but...
      It still amazes me just how many really useful modules are shared in the Monastery every day!

Re: Comparing a value to a list of numbers
by hippo (Bishop) on Jan 29, 2021 at 16:59 UTC
Re: Comparing a value to a list of numbers
by Fletch (Bishop) on Jan 29, 2021 at 16:58 UTC

    Straight from the examples in Quantum::Superpositions

    use Quantum::Superpositions; if ($x == any($a, $b, $c)) { ... }

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      IIRC this is implemented by any returning an object overloading the == operator. And there is maybe also tie'ing involved.

      Hence many movable and slow parts. Essentially a very sweet syntax coming with a cost.

      Question: is there also intelligent support for ranges? I'd buy into this if yes.

      like if ($x == any(1,2,5,6,9,10,[41,56]) { ...  } ?

      I suppose it's clear where the advantages are ...

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        No clue, although I completely glossed on the range requirement in the OP. Having seen that I'd change my answer to Set::IntSpan probably.

        The cake is a lie.
        The cake is a lie.
        The cake is a lie.

Re: Comparing a value to a list of numbers (in via hash)
by LanX (Saint) on Jan 29, 2021 at 16:52 UTC
    TIMTOWTDI

    Depending on your requirements on time and memory complexity, you could also use a hash

    DB<28> %in = map { $_ => 1 } 1,2,5,6,9,10,41..56 DB<29> p $in{5} 1 DB<30> p $in{0} DB<31>

    update

    corrected C&P error omitting the 1

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re: Comparing a value to a list of numbers
by jcb (Parson) on Jan 31, 2021 at 04:50 UTC

    Other monks have suggested a binary search solution, but did not provide an example. I took writing an optimized binary search for this as a challenge and wrote this:

    #!/usr/bin/perl # -*- CPerl -*- use strict; use warnings; # sparse range structure: # array of arrayref/number # single integers represented as number # contigous ranges as arrayref [LOW,HIGH] inclusive # given: string "A,B,C,X-Y,Z" # return: sorted array [A, B, C, [X, Y], Z] sub parse ($) { my @elements = split /,/, shift; foreach (@elements) { s/\s+//g; # prune whitespace next unless m/\d+-\d+/; # skip conversion if single integer $_ = [split /-/, $_]; # convert ranges to arrayrefs } # sort range set @elements = sort {(ref($a)?$a->[0]:$a) <=> (ref($b)?$b->[0]:$b)} @el +ements; # merge overlapping loose elements into preceding ranges for (my $i = 0; $i < $#elements; $i++) { next unless ref $elements[$i]; # skip single integers while ($i+1 <= $#elements and $elements[$i+1] <= $elements[$i][1]) { splice @elements, $i+1, 1 } # remove elements included in r +ange } # coalesce contiguous integers into ranges for (my $i = 0; $i < $#elements; $i++) { next if ref $elements[$i]; # skip ranges if ($elements[$i]+1 == $elements[$i+1]) { my $j = 1+$i; $j++ while !ref($elements[$j]) && $elements[$j]+1 == $elements[$ +j+1]; splice @elements, $i, 1+$j-$i, [$elements[$i], $elements[$j]]; } } # merge adjacent loose elements into succeeding ranges for (my $i = 0; $i < $#elements; $i++) { next if ref $elements[$i]; # skip ranges next unless ref $elements[$i+1]; # but next element is a range # There can be at most one such element, since contiguous integers + were # coalesced into ranges above. if ($elements[$i]+1 == $elements[$i+1][0]) { splice @elements, $i, 2, [$elements[$i], $elements[$i+1][1]] } } # merge adjacent loose elements into preceding ranges for (my $i = 0; $i < $#elements; $i++) { next unless ref $elements[$i]; # skip single integers next if ref $elements[$i+1]; # but next element is a single int +eger # There can be at most one such element, since contiguous integers + were # coalesced into ranges above. if ($elements[$i][1]+1 == $elements[$i+1]) { splice @elements, $i, 2, [$elements[$i][0], $elements[$i+1]] } } # merge overlapping ranges for (my $i = 0; $i < $#elements; $i++) { next unless ref $elements[$i] and ref $elements[$i+1]; splice @elements, $i, 2, [$elements[$i][0], $elements[$i+1][1]] if $elements[$i][1] >= $elements[$i+1][0]; } # merge adjacent ranges for (my $i = 0; $i < $#elements; $i++) { next unless ref $elements[$i] and ref $elements[$i+1]; splice @elements, $i, 2, [$elements[$i][0], $elements[$i+1][1]] if $elements[$i][1]+1 == $elements[$i+1][0]; } return \@elements; } # given: sorted array from sub parse and integer # return true if the integer is in the sorted array sub search ($$) { my $set = shift; my $num = shift; my $left = 0; my $right = $#$set; my $i = $#$set >> 1; # bitshift for integer /2 while ($left < $right) { if (ref($set->[$i])) { # evaluate a range return 1 # number within this range if $num >= $set->[$i][0] && $num <= $set->[$i][1]; if ($num > $set->[$i][0]) { $left = $i+1 } else { $right = $i-1 } } else { # evaluate a single integer return 1 # number matched if $num == $set->[$i]; if ($num > $set->[$i]) { $left = $i+1 } else { $right = $i-1 } } $i = ($left + $right) >> 1; # bitshift for integer /2 } # last check if (ref($set->[$i])) { return $num >= $set->[$i][0] && $num <= $set->[$i][1] } else { return $num == $set->[$i] } } my $Set = parse shift; use Data::Dumper; print Data::Dumper->new([$Set],[qw(Set)])->Indent(0)->Dump,"\n"; foreach (@ARGV) { print $_, search($Set, $_) ? ' is' : ' is not', " in the set\n"; }

    The script expects a set of numbers as its first command line argument, and then various numbers to test against the set as additional arguments. Examples:

    $ ./bsearch.pl 1,2,5,6,9,10,41-56 1 4 42 17 $Set = [[1,2],[5,6],[9,10],[41,'56']]; 1 is in the set 4 is not in the set 42 is in the set 17 is not in the set
    $ ./bsearch.pl 1,2,11-16,6,7,19,9,5-8,13,14,15,4 1 2 3 4 5 8 9 10 11 1 +2 16 17 18 19 20 $Set = [[1,2],[4,9],[11,16],19]; 1 is in the set 2 is in the set 3 is not in the set 4 is in the set 5 is in the set 8 is in the set 9 is in the set 10 is not in the set 11 is in the set 12 is in the set 16 is in the set 17 is not in the set 18 is not in the set 19 is in the set 20 is not in the set

    This latter example was used for developing the set-optimization logic.

      Other monks have suggested a binary search solution, but did not provide an example.

      Set::IntSpan uses a binary search.

      > Other monks have suggested a binary search solution, but did not provide an example.

      FWIW I did, but didn't publish, since it didn't fit into davido's benchmark (which has also it flaws by only testing 42)

      DB<47> sub tst { if ($_>=9) { if ($_>=41 and $_<=56) {1} elsif ($_<= +10) {1} } elsif ($_<=6) { if ($_>=5){1} elsif \ ($_<=2 and $_>=1) {1} }} DB<48> %in = map { $_ => 1 } 1,2,5,6,9,10,41..56 DB<49> tst != $in{$_} and print "error $_" for 1..190 DB<50>

      I was also pondering about how to optimize a solution (time and memory) for all cases.

      Like potentially

      • many points
      • many ranges
      • floats instead of integers
      • multiple dimensions (like rectangular areas in a plane)
      An "optimal" solution would involve combining different techniques ...

      And some techniques which were prominent in similar older threads haven't been mentioned yet.

      Anyway all of this is far beyond what the OP wanted.

      But yes: again "I'm suggesting but not providing an example" ;-)

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        I assumed that the set would vary (although using eval to convert the arrayref returned from parse into a closure like your tst could be interesting even if compile could end up longer than parse) and tried to optimize for the "many points" and "many ranges" cases; thus all those extra passes in the parse sub to simplify the result. I excluded floating point because exact comparisons are not reliable and it is easier to restrict the domain to integers. Multiple dimensions start to get complex enough that I suspect a general optimal solution is impossible — optimizing that case will depend on the actual data used.

        The "suggesting but not providing an example" remark was an excuse for posting my solution — usually these questions have been "done to death" before I get to them. ;-)

Re: Comparing a value to a list of numbers (in via smartmatch)
by LanX (Saint) on Jan 29, 2021 at 16:47 UTC
    well, for completeness:

    There is smartmatch ~~ (sometimes) acting like an in operator

    DB<24> $x=5 DB<25> p $x ~~ [1,2,5,6,9,10,41..56] 1 DB<26> $x=0 DB<27> p $x ~~ [1,2,5,6,9,10,41..56] DB<28>

    but

    WARNING: Smartmatch is experimental

    so you'd need to deactivate the warning and hope it'll never vanish (at your risk)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      I personally would not use the smartmatch operator for anything.
      I did some experimenting with this experimental operator and found out that it is "too smart for me!". I could not accurately predict what this thing would actually do!
        I agree, but there is a real need for an "in"-operator like in other languages.

        I hope this aspect will survive somehow and I listed it for completeness with full warning.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: Comparing a value to a list of numbers
by tybalt89 (Monsignor) on Feb 01, 2021 at 02:32 UTC

    And now for something completely different...

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11127703 use warnings; no warnings qw( qw ); use Search::Dict; @ARGV = qw( 1,2,5,6,9,10,41-56 1 4 42 17 ); my $file = join '', map "$_\n", split /,/, shift; open my $fh, '<', \$file or die; for my $n ( @ARGV ) { look $fh, $n, { comp => sub { $_[0] <=> $_[1] }, xfrm => sub { $_[0] =~ s/\d+-//r }, }; my $have = <$fh>; my $not = (defined $have and $have =~ /(\d+)-(\d+)/ ? $1 <= $n <= $2 : $have == $n) ? '' : ' no +t'; print "$n is$not in set\n"; }

    Outputs:

    1 is in set 4 is not in set 42 is in set 17 is not in set
Re: Comparing a value to a list of numbers
by Polyglot (Chaplain) on Feb 02, 2021 at 03:34 UTC

    The variety of answers you've already gotten just shows how broad the TMTOWTDI principle actually is in Perl. I would do something entirely different than has yet been suggested--might not be the best, but it has worked for me in similar situations a number of times.

    my $check = sprintf("%0.2d", $x); next if ('01,02,05,06,09,10,41,42,43,44,45,46,47,48,49,50,51,52,53,54, +55,56' =~ m/$check/);

    Obviously, if the list gets too long, this might not be practical. As written, this accommodates one- or two-digit numbers (though this could be tailored for larger ones, with the addition of more leading zeros). But for a list such as you suggested, it might work well enough--better than taking time to hash it all in terms of its simplicity, if not its efficiency.

    Blessings,

    ~Polyglot~

Re: Comparing a value to a list of numbers
by Anonymous Monk on Jan 29, 2021 at 18:51 UTC
    Seriously ... List::Util is your friend. (Even if you just "steal" a particular algorithm from it.) There's no reason to do something over if someone else has already done it.
      Please give an example (in perl code) of hose List::Util can be used to solve the OP's problem. Otherwise your "help" is just a waste of space and a waste of everybody's time.
        Please give an example (in perl code) of hose List::Util can be used to solve the OP's problem
        use warnings; use strict; use List::Util 'uniq'; my @given = (1,2,5,6,9,10,41..56); my @check = (3,4,9,11,48,102); my $c = uniq(@given); for(@check) { push(@given, $_); print "$_ is in the list\n" unless $c + 1 == uniq(@given); pop(@given); } __END__ Outputs: 9 is in the list 48 is in the list
        List::Util::uniq() utilizes the hash lookup referred to elsewhere in this thread.
        Some efficiency is lost in this script because uniq() has to generate the hash every time it is called.
        It would be more efficient if the perl code itself created the hash (just once), and then re-used that hash for each number that needs to be checked.

        Perhaps anonymous had some other List::Util routine in mind.

        Cheers,
        Rob

        A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11127640]
Approved by davies
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (1)
As of 2024-04-25 04:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found