Caching process sets

billyak has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Caching process sets by Thelonius (Priest) on Feb 19, 2003 at 20:18 UTC
I rather doubt it's worth the effort. I've written Perl programs to parse files of 500,000+ lines and it runs in 20 seconds or so. Are you actually experiencing long run times? If you can easily sort the file on the non-date fields, then identical items will be adjacent, so you won't have to use a large amount of memory for a hash cache. But I still question whether it's necessary	[reply]
Re: Caching process sets by dragonchild (Archbishop) on Feb 19, 2003 at 20:33 UTC
It all depends on what you're doing. An obvious solution would be to use a hash (or HoHoHo..oH) that would keep track of what you've already worked on. You would parse the line, check the cache, and do the actions only if the line wasn't in the cache. As the other poster said, this is only useful if your actions per line are very expensive. "some simple math" doesn't sound like it would be expensive enough. However, 55% does sound like a potentially signficant savings. Without Benchmarking, it's impossible to know for certain, but the parsing is often the most expensive part of working with logfiles, not the actions one takes at each point. ------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.	[reply]
Re: Re: Caching process sets by billyak (Friar) on Feb 19, 2003 at 20:56 UTC
Sorry, I guess I was not clear. I want to cache the set of actions. If line "aaa moo didley" appears thirty times, I want to establish a set of actions based on the first parsing, and follow through with this set of actions for each subsequent occurance of this line. The idea is to avoid the extra parsing by first looking up the event in a hash to see if there has already been a set of actions determined for it. -billyak	[reply]
Re3: Caching process sets by dragonchild (Archbishop) on Feb 19, 2003 at 22:28 UTC
You want closures. How you want closures ... that's going to be based on what you're doing. If you want more help, you're going to need to give a few examples of data and the actions you'd want to take on them. Then, one of us might be able to point you in the right direction. ------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.	[reply]
Re: Caching process sets by demerphq (Chancellor) on Feb 19, 2003 at 20:39 UTC
Memoize --- demerphq	[reply] [d/l]
Re: Re: Caching process sets by billyak (Friar) on Feb 19, 2003 at 20:59 UTC
A specific event line does more than what a `return` is made to do. I would need a seperate subroutine for each of my cached lines. Hence, the `eval` I mentioned in the original post. -billyak	[reply] [d/l] [select]
Re: Re: Re: Caching process sets by xmath (Hermit) on Feb 19, 2003 at 21:08 UTC
It's still not entirely clear to me what you're doing, but it sounds like you want are (anonymous) sub references. This code would check if an action correponding to $key is known, determine the action if it's not known, and in either case execute the action: `($actions{$key} \|\|= determine_action($key))->($key, $data);` [download] The sub `determine_action` would have to find out what the correct action is, and return a sub reference to it. If you don't want to make explicit subs, use anonymous ones: `sub determine_action { my ($key) = @_ if (it's first action) { return sub { my ($key, $data) = @_; do stuff; }; elsif (it's second action) { return sub { my ($key, $data) = @_; do stuff; }; ... etc... }` [download]	[reply] [d/l] [select]
Re: Re: Re: Caching process sets by demerphq (Chancellor) on Feb 19, 2003 at 22:46 UTC
Instead of just evaling the code, wrap the code in an anonymous sub, thus capturing it so you can resuse it. So we have a routine called parse_to_actions that builds a bunch of lines of perl statements that need to be executed. Then we do this: `my %code_cache; while (<>) { my $code=$code_cache{$_}; unless ($code) { my @actions=parse_to_actions($_); $code=eval "sub { @actions }" or die "$@ while evaling actions @actions "; $code_cache{$_}=$code; } $code->(); }` [download] Similar to what xmath posted, but building the subs dymacially. However when you consider that we can define parse_to_actions to return a sub, then we could `use Memoize; sub parse_to_actions { return eval "sub { @lines_of_code }" or die $@; } memoize("parse_to_actions"); while (<>) { #Parse and generate. Memoize caches. parse_to_actions($_)->($_); # pass the line to the generated sub # just in case it gets smart }` [download] --- demerphq	[reply] [d/l] [select]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks