comment on

Hello Monks,

I am in charge of re-factoring a script which basically iterates over an array from a SELECT query from a database which returns ~1 Trillion records, the array restructures the items in a certain pattern and produces csv files. The script works fine, but the issue I am facing is that it takes ~16 days to finish. I have to decrease the run-time of the script; So these were the thoughts I had:

Grid Engine
Hadoop
Optimize the script

I appreciate any advice regarding any of the three topics, what modules to use, any past experience with such huge number of records, or any other solutions to optimize the performance of the script. Thanks in advance.

In reply to Processing ~1 Trillion records by aossama

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl-Sensitive Sunglasses
	PerlMonks