Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

fastest way to compare two arrays

by krusty (Hermit)
on Apr 13, 2011 at 23:59 UTC ( [id://899314]=perlquestion: print w/replies, xml ) Need Help??

krusty has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have two arrays of numbers like so:
my @range = (50000 .. 65535); my @ports = (50000..51545,51549,51555..51999,@more);

where the @ports array was populated from a spreadsheet and is subject to change (and the range may not be permanently fixed either). The @ports array will always be a subset of @range, though and @range will never have gaps in numbering. I'm trying to find the first open port in @ports for a given range and figure there has to be a more efficient way. What I have is a glorified increment by 1 and compare operation. I'm using 5.8 on a work server and can't upgrade to get the smart match operator.
Update: By open port, I mean port that isn't being used and therefore isn't in @ports.
my @hash{@ports} = (1) x @ports; #set indices to true for ports in use my $open_port; foreach my $port (@range){ $open_port = $port and last unless $hash{$port}; }

I've read some of the posts and a FAQ How can I tell whether a certain element is contained in a list or array?, but none seem to quite do what I want, i.e. look for what isn't there in the most efficient manner as they all seem to do an increment by one or loop one by one, too. I thought of maybe removing indicies of a hash keyed from @range by another set of indicies from @ports instead of looping but my brain hurts now.

Thanks,
Kristina

Replies are listed 'Best First'.
Re: fastest way to compare two arrays
by BrowserUk (Patriarch) on Apr 14, 2011 at 00:47 UTC

    This shoudl be a little quicker. It uses a bit vector rather than a hash which is a bit quicker to start. It then scans the bit vector 64-bits at a time until it finds a possibility and then by bit until it isolates the next value:

    #! perl -slw use strict; my @ports = ( 50000..51545,51549,51555..51999, 65535 ); my $vec = ''; vec( $vec, $_, 1 ) = 1 for 0 .. 50_000, @ports; sub nextFreePort { no warnings 'portable'; my $open_port; my $t = int( length( $vec ) / 8 ); my $n = 782; ++$n while $n <= $t and vec( $vec, $n, 64 ) == 1844674407370955161 +5; $n *= 64; vec( $vec, $_, 1 ) or $open_port = $_ and last for $n .. $n + 64; vec( $vec, $open_port, 1 ) = 1; return $open_port; } my $next_port = nextFreePort(); print $next_port; __END__ c:\test>899314.pl 51546

    Worst case of only 65535 being available, it does 242+63 tests+incs, rather than 15500.

    If I were writing this in assembler, I'd test lo/hi 32-bit; lo/hi 16-bit; lo/hi 8-bit before hitting the bits individually thereby reducing the 63 tests/incs to 10.

    It require 8k of string which is quite possibly smaller than your hash. That coudl be reduced further by not storing bits for 0 .. 50_000.

    It's a POC. I haven't tested the edge cases.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: fastest way to compare two arrays
by GrandFather (Saint) on Apr 14, 2011 at 00:13 UTC

    With the data structures you have and if this is a one time operation then a linear search is probably the best you can do. If you have the option of changing the way the port information is stored there are many ways to improve performance even for a one time search.

    If you need to search multiple times for the same @range and for only small changes in @ports then taking the hit to transform the port map into a better structure, or at least caching the next untested port will help speed things up. In any case wrapping the information up in an object is likely to make it much easier to manage correctly.

    If you want to provide typical use cases you may get a more focused answer.

    True laziness is hard work
Re: fastest way to compare two arrays
by krusty (Hermit) on Apr 14, 2011 at 00:21 UTC
    So my brain hurt, but 2 seconds after I posted, I had an idea...
    @hash{@range} = (1) x @range; #guess I don't need to replicate now. delete @hash{@ports}; @open = sort {$a <=> $b} keys %hash; #seems to work
    I'd guess so, but until I Benchmark I don't know if its more efficient, but at least I can read it easier. If there are better ways out there, please share.

    Regards,
    Kristina
Re: fastest way to compare two arrays
by roboticus (Chancellor) on Apr 14, 2011 at 15:23 UTC

    krusty:

    Two questions:

    1. Do you really need to use an array?
    2. Do you really need the lowest-numbered available port?

    If the answer is no for both, you might consider using a hash to hold your free ports. So when you want to use a free port, just grab the first available one without sorting, and remove it from the "available" hash. When you're done with it, just return it to the hash.

    If you must use an array, but don't need the lowest-numbered port, then just pop a port from your list, and when you're done with it, push it back on.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: fastest way to compare two arrays
by pajout (Curate) on Apr 14, 2011 at 08:09 UTC
    divide et impera
    1. @range is not necessary - there is just information of minimum and maximum.
    2. If your @ports are sorted, imagine for a while that 100th item is wanted result. It means that 99th item has value 50098 and 100th item is greater than 50099. Check the half index of your range. If it is greater than expected value, seek left side, else seek right side. Seeking left xor right side, it is same problem, but amount of items is half.
Re: fastest way to compare two arrays
by Cristoforo (Curate) on Apr 15, 2011 at 23:50 UTC
    Set::IntSpan can provide relevent routines when working with (connected) ranges of numbers. Although it can work with array references it can be initialized with just a string (and so you save a lot of memory).

    #!/usr/bin/perl use strict; use warnings; use Set::IntSpan; my $range = Set::IntSpan->new( '50000-65535' ); my $used = Set::IntSpan->new( '50000-51545,51549,51555-51999' ); my $unused = $range->diff( $used ); my @newports; while (my $port = $unused->next) { print $port, "\n"; push @newports, $port; last if @newports == 10; } print "Unused before newports: ", $unused->run_list, "\n"; for (@newports) { $unused->remove($_); $used->insert($_); } print "Unused after newports: ", $unused->run_list, "\n"; print "Used after newports: ", $used->run_list, "\n"; __END__ C:\Old_Data\perlp>perl t5.pl 51546 51547 51548 51550 51551 51552 51553 51554 52000 52001 Unused before newports: 51546-51548,51550-51554,52000-65535 Unused after newports: 52002-65535 Used after newports: 50000-52001
    Update: Well, as it happens, I shouldn't make statements about that which I don't know (well). The documentation says:

    Sets are stored internally in a run-length coded form. This provides for both compact storage and efficient computation. In particular, set operations can be performed directly on the encoded representation...
Re: fastest way to compare two arrays
by repellent (Priest) on Apr 16, 2011 at 07:16 UTC
    useSet::IntSpan::Fast;
    my $range = Set::IntSpan::Fast->new('50000-65535'); my $ports = Set::IntSpan::Fast->new('50000-51545,51549,51555-51999'); my $unused = $ports->copy; $unused->invert; my $unused_in_range = $range->intersection($unused); my $first_unused = ($unused_in_range->iterate_runs->())[0];

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://899314]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (None)
    As of 2024-04-25 00:47 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found