comment on

I have some text files (several hundred megabytes each) that I am processing. To simplify, I am going through the sections (think of them as paragraphs) and removing lines that are "ref n" (where n is an integer). There will be just a few hundred of these per file.

So I am just reading the whole file into memory, and substituting out the offending lines (I am actually removing the "ref n" lines, and not the lines that begins with n, which I am matching in the first line of code below).

while ($allfile =~ /^(\d+) /mg)
    {
    my $objectref = 'ref' . $1;
    push (@objects, $objectref);
    }

for (@objects) { $allfile =~ s/^ +$_\n//m; }

print $allfile
[download]

This worked fine (although likely far from the best way to do it), taking about ten minutes or so on average. Then I found out that (rarely - maybe once every dozen files or so) some lines that I need to remove will actually be "foo ref n". No problem, I thought. I just changed to code to:

for (@objects) { $allfile =~ s/^ *(foo)? +$_\n//mn; }
[download]

Something is not working as I expected. :) I am ninety minutes into processing the first file after the code change, and there is no sign of any progress. Why is it taking so long, and how can I improve my algorithm / code? Thank you in advance.

In reply to some efficiency, please by Anonymous Monk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Keep It Simple, Stupid
	PerlMonks