http://qs321.pair.com?node_id=722996


in reply to how to speed lookups?

You are doing 64,425,800,000 assignments where should be doing 322,129.

$linecontainsbuf{$searchfield} = $line for @validbufs;
should be
$linecontainsbuf{$searchfield} = $line;

Also, using split does more work than you need. Try replacing

@fields=split /\'/,$line; $searchfield=$fields[1];
with
($searchfield) = $line =~ /^'([^']*)/;

Replies are listed 'Best First'.
Re^2: how to speed lookups?
by lukka (Novice) on Nov 11, 2008 at 21:36 UTC
    Thanks a million for pointing out the offending line. As you correctly said i was unnecessarily doing those assignments (when there was no need of them). Just by implementing your first suggestion (removing the for @validbufs part) helped bring the runtime to say 2 minutes (from the huge runtime of more than 2 hours (probably even more) that my erroneous code was doing earlier) Thanx again
Re^2: how to speed lookups?
by blazar (Canon) on Nov 12, 2008 at 19:04 UTC
    Also, using split does more work than you need. Try replacing

    I personally believe that people indeed tend to overuse split too much. But at the same time IIRC the latter is optimized if an explicit LIMIT parameter and maybe (I only have a vague memory of this, and I admit I may just have "invented" it...) even simply if a specific number of items are assigned, i.e.:

    my ($searchfield) = split /'/, $line;

    Update: as far as the last point is concerned, I definitely stand corrected, as per ikegami's remark, which I thoroughly trust.

    But at this point one should certainly do a benchmark to be sure, and I don't have the slightest intention of doing so, especially given that the question asked by the OP has bigger issues to the point of not really being understandable at all, as far as I'm concerned...

    --
    If you can't understand the incipit, then please check the IPB Campaign.

      While the OP mentions fetching the first field, you'll notice the OP used $fields[1], not $fields[0]. I believe that means he's using split to parse a single-quoted string. That's not what I'd call an appropriate use of split.

      So in this case, the optimal split would be

      my $searchfield = ( split /'/, $line, 3 )[1];

      Using split requires matching twice, creating three variables and copying the entire string.

      my ($searchfield) = $line =~ /^'([^']*)/;

      Using the match operator requires matching once, creating one variable and copying only the field. Actual performance aside, it's definitely a cleaner process.

      Finally, split will be very bad at handling escaping when the OP discovers the need for it.

      maybe ([...]) even simply if a specific number of items are assigned

      An operand doesn't know what the caller will do with the returned list, so your "maybe" doesn't apply. If it behaves as you think, the following snippet will result in $c being assigned 2 when it should be assigned 3.

      $c = ($f1,$f2) = (4,5,6);