Clear questions and runnable code get the best and fastest answer |
|
PerlMonks |
Excluding groups of characters in regular expressionsby semirhage (Initiate) |
on Dec 21, 2007 at 14:37 UTC ( [id://658453]=perlquestion: print w/replies, xml ) | Need Help?? |
semirhage has asked for the wisdom of the Perl Monks concerning the following question:
I'm trying to remove any nested paragraph tags from a huge file. I have come up with the following regex so far... s/<paragraph>(.*?)<paragraph>(.*?)<\/paragraph>(.*?)<\/paragraph>/<paragraph>$1$2$3<\/paragraph>/ig; This appears to work in many cases however it also removes every other pair of properly formatted paragraph tags... e.g. If the input was the following: <paragraph>some data</paragraph><paragraph>more data</paragraph><paragraph>even more data</paragraph> The regex would result in this being changed to: <paragraph>some data</paragraph>more data<paragraph>even more data</paragraph> After thinking about it, it makes sense since I am trying to match to four tag units within the text and the (.*?) doesn't exclude other paragraph tags from being included... Is there anyway to exclude <paragraph> or </paragraph> from the (.*?) matches? Thanks... Tom
Back to
Seekers of Perl Wisdom
|
|