P is for Practical | |
PerlMonks |
Re: Hash Search is VERY slowby LanX (Saint) |
on Sep 29, 2021 at 16:36 UTC ( [id://11137118]=note: print w/replies, xml ) | Need Help?? |
My guess is also that your memory consumption leads to excessive swapping. The general idea is presorting, for instance by iterating multiple times over the file and only processing one IP range after the other. This is expensive in IO but will create smaller data structures.
Only if ...... you really need all the data present in memory at once, consider breaking up the ranges into a tree of nested data structures and processing them in linear order. like $hash->{'192'}{'168'}{'101'}{'208'} or $hash->{'192.168'}{'101.208'} instead of $hash->{'192.168.101.208'} ° If you now process all IPs in order , then Perl (well the OS) will be able to swap all memory-pages with unrelated sub-hashes out. This will be cheap because the number of swaps is minimized by the sorting. (see also Re: Small Hash a Gateway to Large Hash? ) An additional approach is using more compact data structures, hashes are efficient for sparse data. But if your IPs range from 0-255 an array is certainly more efficient. Furthermore, there is no point in repeating URLs like "logmeinrescue.com" in your array, counting them is more memory efficient. Anyway 800k lines input doesn't sound heavy though, not sure if we have the full picture (???)
editLike choroba already said, preloading the input completely into memory sounds like a waste of resources, you should check how much that costs. OTOH if you decided to implement my initial idea to process one IP range after the other, it'll reduce IO if (and only if) all fits into memory.
Cheers Rolf °) I'm aware that 192.168.*.* is very common
In Section
Seekers of Perl Wisdom
|
|