good chemistry is complicated, and a little bit messy -LW |
|
PerlMonks |
Re: How do I extract all text between two keywords like start and end?by stephen (Priest) |
on Apr 15, 2000 at 02:16 UTC ( [id://7675]=note: print w/replies, xml ) | Need Help?? |
Hmmm... I'm afraid that the recursive 'between'
above there might not work for complex cases.
The non-greedy regexp would make it match the first
start-end pair it found, so if we had:
then we should wind up with the whole thing, minus start and end and yadda, but instead we get:
The only way I can think of to get around this is by keeping external track of the levels. This also de-recurses it, which makes it less beautiful, but faster (in theory):
This returns:
So what we're doing here is going through the text looking for 'start's and 'end's. We keep a counter indicating how many levels deep we are in 'start's and 'end's. Every time we hit a 'start', we add one. Every time we hit an 'end', we subtract one, checking first to make sure that our level doesn't go negative. (Otherwise, somebody could mess us up by starting a file "end end end".) Afterwards, we look at the patch of text between the current tag and the next start/end tag. If our level is greater than 0, we're between a 'start' and an 'end' tag, so we store that segment. Otherwise, we're not, so we look for another 'start' or 'end' tag until the end of file.
In Section
Seekers of Perl Wisdom
|
|