Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Long list is long

by kcott (Archbishop)
on Oct 30, 2022 at 04:12 UTC ( [id://11147826]=note: print w/replies, xml ) Need Help??


in reply to Long list is long

G'day Chuma,

A number of problems with the code you posted:

  • Missing strict.
  • Missing warnings.
  • Missing I/O checking and exception handling. I suggest autodie.
  • Package variables used throughout. Use lexical (my) variables instead; follow links for details. This applies to your filehandles as well.
  • Use the 3-argument form of open.

From your description, I'd say the bottleneck lies with the population of the three arrays: @p, @q and @i. This is all unnecessary work and those arrays are not even needed. See "perlperf - Perl Performance and Optimization Techniques", for benchmarking and profiling techniques, to get a clearer picture of where problems lie.

I created these three dummy test files. In case you're unfamiliar with cat -vet, ^I represents a tab and $ represents a newline.

$ for i in A B C; do echo -e "\n*** $i"; cat $i; echo '----'; cat -ve +t $i; done *** A foo 73 bar 35 word 27 blah 23 ---- foo^I73$ bar^I35$ word^I27$ blah^I23$ *** B bar 35 yada 3 word 27 blah 23 ---- bar^I35$ yada^I3$ word^I27$ blah^I23$ *** C foo 73 word 27 blah 23 life 42 ---- foo^I73$ word^I27$ blah^I23$ life^I42$

Then this test code:

#!/usr/bin/env perl use strict; use warnings; use autodie; my @in_files = qw{A B C}; my $outfile = 'merge_count.out'; my %data; my $out_fmt = "%s\t%d\n"; for my $infile (@in_files) { open my $fh, '<', $infile; while (<$fh>) { my ($word, $count) = split; $data{$word} += $count; } } open my $fh, '>', $outfile; for my $key (sort { $data{$a} <=> $data{$b} } keys %data) { printf $fh $out_fmt, $key, $data{$key}; }

Output (raw and showing special characters):

$ cat merge_count.out yada 3 life 42 blah 69 bar 70 word 81 foo 146 $ cat -vet merge_count.out yada^I3$ life^I42$ blah^I69$ bar^I70$ word^I81$ foo^I146$

Try that with your real files. I suspect it should be faster and not have the bottlenecks. Let us know if you still have problems: show your new code and profiling output (in <readme> or <spoiler> tags).

— Ken

Replies are listed 'Best First'.
Re^2: Long list is long
by eyepopslikeamosquito (Archbishop) on Oct 30, 2022 at 10:11 UTC

    A number of problems with the code you posted: Missing strict ... Missing warnings ...

    Kudos to kcott for patiently showing by example yet again how to write excellent, clean and succinct Perl code. Unfortunately, it seems unlikely that the monk in question will follow your sage advice, despite a gentle nudge from haukex a year ago.

    Given the Bod's recent posting history on this topic:

    I'm trusting that a once similarly recalcitrant Bod has now seen the light and might be persuaded to share his experiences ... along with why (or why not) he uses strict and warnings nowadays.

    Oh, and just in case it helps, a non-Perl Monk reference on this topic: Always use strict and warnings in your Perl code (perlmaven)

    Update: I also keep a list of references on this topic: use strict and warnings References

      G'day eyepopslikeamosquito,

      Thanks for the compliment.

      I usually look at an OP's previous posts. This gives me an idea of the OP's level of Perl knowledge and how best to frame my response. I did check on this occasion; wondered if I was flogging a dead horse; but chose to proceed anyway.

      To Chuma: I've been writing Perl code for almost 30 years. A substantial part of my paid employment involves writing Perl code. I use these pragmata for personal, $work and PM code. I don't do this because it's trendy, expected, or for any other frivolous reasons; I do it because they're extremely useful and save me a lot of time.

      Even with decades of experience, I still make typos and other silly mistakes (just like everyone else does) — I'd like Perl to tell me about these problems as soon as possible, instead of getting weird or unexpected output and spending a lot of time tracking down the source of the bug. I encourage you to use these pragmata for the same reasons.

      — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11147826]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (9)
As of 2024-04-16 09:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found