Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Out of memory

by Ratazong (Monsignor)
on Jun 09, 2015 at 12:30 UTC ( [id://1129656]=note: print w/replies, xml ) Need Help??


in reply to Out of memory

You don't need to load the whole file into memory, but just as many lines as there are in your search string. So you can use the following approach:

  1. read n lines
  2. do pattern match
  3. remove the first line
  4. read one more line
  5. goto 2
HTH, Rata

Replies are listed 'Best First'.
Re^2: Out of memory
by Athanasius (Archbishop) on Jun 09, 2015 at 15:49 UTC

    ++Ratazong (when the Vote Fairy next visits) for this sliding window solution. But I have two quibbles:

    1. Say the search string is 230 characters long. Then since each input line is 80 characters, the search string is 3 lines long (because 3 x 80 = 240 is the smallest multiple of 80 to be >= 230). So n is 3. But the pattern may begin near the end of an input line and stretch over 4 lines. So the minimum size of the sliding window is n + 1 (320 characters for the example search string).

    2. Setting the window size to n + 1 lines will produce the smallest memory footprint. But it will also entail a large amount of processing, much of it duplicated, as the regex engine searches over and over within the same overlapping text. If the window size is, say, ten times the minimum (i.e., 3200 characters for the 230 character search string), only 3 of the ten lines need be duplicated in each subsequent window — already a significant saving in processing time. Determining an optimum window size — one which successfully balances memory usage against processing time — will depend on the OP’s requirements and available memory, and will likely require some trial-and-error. But I expect the savings in processing time will more than compensate for the time spent in optimising the window size.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      .Those are very valid points , thanks much @Athanasius

Re^2: Out of memory
by sandy105 (Scribe) on Jun 15, 2015 at 11:52 UTC

    sorry for the late reply , although its at random positions that the search string appears it can be spanning no more than 3 lines , so yes probably i can start doing what you suggested . Thanks !

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1129656]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2024-04-19 19:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found