http://qs321.pair.com?node_id=38624

d_brown3 has asked for the wisdom of the Perl Monks concerning the following question:

I have this code and I need to return the first five words of the string returned in $2 by the first regular expression. The file is an xml file with a huge chunk of CDATA that contains a number of editorials split by <p> tags.
while (<INFILE>) { if (/(<editorialtext><!\[CDATA\[|<p>)(.*)(\]\]><\/editorialtext|$) +/) { my $editorial = $2; my $headline = $2; $editorial =~ s/<(?:[^>'"]*|(['"]).*?\1)*>//gs; } }

Originally posted as a Categorized Question.