Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Handling large amounts of data in a perl script

by sjwnih111 (Novice)
on Jan 08, 2014 at 19:04 UTC ( [id://1069847]=perlquestion: print w/replies, xml ) Need Help??

sjwnih111 has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that extracts large amounts of data on which it often needs to perform calculations. For example, it might extract the ages of 100 people and then output an average age. I don't know much about object-oriented programming, but it looked like it might be a good way to store and manipulate data within the script; so, I could extract the ages, making each person an object, and then do the calculation, and then output the average. Is this feasible/reasonable? The other option is creating a database and doing the calculations within, but why add 100 values to a database when all I really need is the average? Also, it might give me an opportunity to learn about object-oriented Perl. If anyone has any advice I would be much indebted.
  • Comment on Handling large amounts of data in a perl script

Replies are listed 'Best First'.
Re: Handling large amounts of data in a perl script
by ww (Archbishop) on Jan 08, 2014 at 19:36 UTC
    Aside from the satisfaction of over-engineering, as noted by educated_foo above, or from the sheer joy of learning something new for which you've indicated no clear need, if you already "...have a script that extracts..." data, presumably including the 100 ages you want, then why do more than select and sum (cf +=) those ages and obtain your average by dividing by 100 (cf /)?

    Alternately, some questions: from what and in what format is the extracted data you're trying to process? How are you doing so now? How would you do it with pencil and paper.

    "I don't know much about object-oriented programming" (and that applies to /me too) is a valid statement but that lack of knowledge by itself, lacking one of the motivations in para 1 above, seems (IMO) to suggest that before trying to practice it, you learn enough about its strenths, weaknesses to determine whether the problem you've chosen falls within its generally-agreed-upon 'appropriate uses.'

    Come, let us reason together: Spirit of the Monastery
Re: Handling large amounts of data in a perl script
by educated_foo (Vicar) on Jan 08, 2014 at 19:18 UTC
    That sounds like some serious over-engineering for just averaging 100 ages. Do you actually have something much larger and more complex in mind?
      I'll be working with about 100,000 directories each containing several subdirectories that each contain a few thousand files or fewer files. The calculations themselves will be simple.
        How is the large number of dirs and files relevant to "making each person an object" or to educated_foo's observation and question?
        Come, let us reason together: Spirit of the Monastery
        Okay, so you're dealing with about 10^5 * 10^3 * 10 items, i.e. about a giga-item. On a sufficiently powerful machine, you can fit them all in memory at once if they're just integers (~4GB or ~8GB). If they're not, say if you make them "objects," they won't fit comfortably.

        Now you have to ask yourself whether you need to process them all at once or sequentially. If you have to process them all at once (e.g. sorting), you'll have to do something clever. Otherwise (e.g. finding the mean), you can just run through them one at a time, updating some state in your program, e.g.

        while ($age = next_age()) { $ages += $age; $n++; } print "average = ", $ages / $n, "\n";
Re: Handling large amounts of data in a perl script
by Laurent_R (Canon) on Jan 08, 2014 at 22:28 UTC

    There are some programming languages where you have to use object for more or less everything. Perl is not among them: in Perl you can use objects when they are useful, and other programming paradigms when they are more efficient. The calculation you want to do can be done in two or three lines of imperative programming code. This is less code lines than what you need just to set up an object. I would really not recommend objects for such simple calculations.

    If you wish to learn OOP, be it with Perl or another language, fine, it is a great idea, but try to do it wit a more ambitious application. Or, put it in another way, the calculations and data munging you are contemplating is not the type of area where OOP is likely to give you any advantage over more conventional methods.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1069847]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-19 14:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found