Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Well, you've all made some good points. :) Particularly that IO is the constraint.

A few thoughts...

Sometimes there is a better algorithm (as opposed to a better optimization). And that probably requires a better understanding of the core problem. For example, did the OP really need an exact tally, or would it have been adequate to sample and extrapolate? If so, what size sample would have been sufficient, and would it have been ok to take the first 1..s ints, or would it have been necessary to take s ints from random positions in the data set. We can't really answer that because we don't know enough about the data set.

Another consideration could be maintaining a count as the data is generated. If over 3GB of integers was generated over the course of hours, days, weeks, or months, the extra cost of maintaining a running tally might go un-noticed, whereas a 60-second sudden lag may be a problem.

And of course we don't know even if a sudden lag IS a problem. Perhaps the user runs this once, walks away to grab a coffee, and comes back to it later, never to run it again. Or perhaps it is something automated as a cron job for 4:27am. Maybe it's like my Windows computer where the world has already accepted that it takes 4 minutes to boot, and where shutting down may take even longer if some MS-pushed upgrade is waiting to finalize. Quite probably none of this matters to the person with the original question, and that being the case not only is my meditation premature micro-optimization, but any attempt to find a better alternative is still unwarranted.

Nevertheless, I found it a fun exercise. If I ventured down the wrong tangent, I'll live and learn. In the end the hour or so spent investigating was educational enough I can justify it. And I'm thankful for the input, reminders, and gentle slap-in-the-back-of-the-head from my friends here.


In reply to Re: "Just use a hash": An overworked mantra? by davido
in thread "Just use a hash": An overworked mantra? by davido

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (7)
    As of 2021-04-18 19:48 GMT
    Find Nodes?
      Voting Booth?

      No recent polls found