Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Hash Search is VERY slow

by rtjensen (Novice)
on Sep 29, 2021 at 14:38 UTC ( [id://11137114]=note: print w/replies, xml ) Need Help??


in reply to Hash Search is VERY slow

It's working! I used choroba's advice and removed the whole if...else construct. Resultant loop looks like this:
while (my $row= $csv->getline_hr($fh)) { $linecounter++; my $ip=$row->{'Source address'}; my $url=$row->{'URL/Filename'}; push @{ $ipURL{$ip} }, $url; if (!($linecounter % 50000)) { print "Lines: $linecounter\n"; } } formatOutput(\%ipURL); # print Dumper \%ipURL;
The entire thing runs in about 45 seconds now:
perl test-urlListbyIP.pl Lines: 50000 Lines: 100000 Lines: 150000 Lines: 200000 Lines: 250000 Lines: 300000 Lines: 350000 Lines: 400000 Lines: 450000 Lines: 500000 Lines: 550000 Lines: 600000 Lines: 650000 Lines: 700000 Lines: 750000 Lines: 800000 Lines: 850000 Formatting Output... List End:1316 Execution Time: 44.18 s

Replies are listed 'Best First'.
Re^2: Hash Search is VERY slow
by AnomalousMonk (Archbishop) on Sep 29, 2021 at 19:13 UTC

    I think you'll find that it was choroba's advice to process the file on a line-by-line (CSV-record-by-record in this case) basis that did the trick. :)


    Give a man a fish:  <%-{-{-{-<

      I’m going to bet it was bliako’s observation that the array was getting cloned every time an element was added. That’s where the N^2 behavior came from.
        Oh I missed that.

        => Re: Hash Search is VERY slow

        Brilliant!

        > that the array was getting cloned

        I think it's not so much the copying alone but the allocation of a new array each time.

        Because the arrays are growing, old released memory often can't be reused, leading to a fragmentation and increasing waste of space.

        Fun! :)

        Tho I might be wrong, the reserved space for an array grows by doubling IIRC, this should be easier to reuse ...

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11137114]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2024-04-24 17:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found