Hello Monks,
I am in charge of re-factoring a script which basically iterates over an array from a SELECT query from a database which returns ~1 Trillion records, the array restructures the items in a certain pattern and produces csv files.
The script works fine, but the issue I am facing is that it takes ~16 days to finish.
I have to decrease the run-time of the script; So these were the thoughts I had:
- Grid Engine
- Hadoop
- Optimize the script
I appreciate any advice regarding any of the three topics, what modules to use, any past experience with such huge number of records, or any other solutions to optimize the performance of the script.
Thanks in advance.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|