|go ahead... be a heretic|
Re^3: some efficiency, pleaseby QM (Parson)
|on Apr 12, 2019 at 16:19 UTC||Need Help??|
If the files are very large, you'll spend more time disk swapping than actually reading/writing.
Make 2 passes: Record all of the "ref" numbers you want to delete in the first pass (use a hash), then reread the file, printing it out according to whether a ref value is in the hash.
But to do this well, with multiline data, you'll have to tell us what a "paragraph" is, because it's not clear to me from your description.
It might look something like this:
The trick, of course, is some_condition;
If it's hard to put a single paragraph into a regex, just note the signposts with flags. Something like this for the first pass:
And then something very similar to that in the 2nd pass, except printing or not printing based on your logic. (If you were very clever, you could reuse that code, with a tweak, passing a parameter for the pass number. But don't get clever until it works.)