Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Regexes eating too much RAM

by diotalevi (Canon)
on Mar 04, 2006 at 06:08 UTC ( [id://534428]=note: print w/replies, xml ) Need Help??


in reply to Regexes eating too much RAM

In general, the recipe is to eliminate all capture groups that operate on your large string. Beyond that, you can try to cut things off as you process them. Perl keeps a marker about where a string begins so if you're contientious, you can convince perl to just advance that pointer.

This is wasteful. When it matches, it makes a copy of $_ to an internal buffer so $1 can refer back to it. Eliminate the capturing parentheses and use substr() with @- and @+ to refer back to what $1 would have contained. The documentation for @- is a good reference for you right now.

You'll notice how I used 4-arg substr to directly replace the first part of the string.

if (/\G<([^<>]*)>/gc) { flush_name(); $state = 'TEXT'; print OUT $1; next; }

Efficient.

if (/\G<[^<>]*>/gc) { flush_name(); $state = 'TEXT'; print OUT substr $_, $-[0] + 1, $+[0] - $-[0] - 1; substr $_, 0, $+[0], ''; next; }

⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://534428]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-04-16 06:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found