taking white space out between closing and opening tags

chuleto1 has asked for the wisdom of the Perl Monks concerning the following question:

Problem:
I would like to delete all tab space "\t" new lines "\n " and more than one continuos space " " inside opening and closing tags and replacing them with ONE space per instance.
$text = "<tag>
The purpose of the applicant rating session is for you,
the applicant, to provide a sample of your
effective teaching skills.</tag>"

The desired result would be:

$text = "<tag>The purpose of the applicant rating session is for you, the applicant, to provide a sample of your effective teaching skills.</tag>"

Comment on taking white space out between closing and opening tags

Replies are listed 'Best First'.
Re: taking white space out between closing and opening tags by Ovid (Cardinal) on Aug 21, 2002 at 20:58 UTC
Here's a quick, untested, stab at it. Let's assume for this example that you are talking about `<p>` tags. use HTML::TokeParser::Simple; # assumes that $text is a scalar containing the actual HTML my $p = HTML::TokeParser::Simple->new( \$text ); my $token; do { $token = $p->get_token } until $token->is_start_tag('p'); my $new_text = $token->return_text; do ( $token = $p->get_token ) { my $temp = $token->return_text; if ( $token->is_text ) { $temp =~ s/\s+/ /g; # collapse whitespace $temp =~ s/^\s//; # remove initial whitespace $temp =~ s/\s$//; # remove trailing whitespace } $new_text .= $temp; } until $token->is_end_tag('p'); $new_text .= $token->return_text; [download] This is a much cleaner method (and accurate) method of accomplishing this task than most regex solutions. I also happen to think that HTML::TokeParser::Simple is easier to use than many other HTML parsing modules. Of course, I may be biased as I wrote that module :) Cheers, Ovid Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply] [d/l]
Re: taking white space out between closing and opening tags by dpuu (Chaplain) on Aug 21, 2002 at 20:50 UTC
split the problem: first get the string, then condense it. Assuming you can't use any of te std XML/HTML modules to get the text, you could try: `sub condense { $_[0] =~ s/\s+/ /g } $in =~ s/(<tag>)(.*?)(<\/tag>)/ $1 . condense($2) . $3 /ge;` [download] --Dave	[reply] [d/l]
Re: taking white space out between closing and opening tags by Mr. Muskrat (Canon) on Aug 21, 2002 at 20:51 UTC
I'll help with the regex requirements. `#/usr/bin/perl -w use strict; my $text = "<tag>\n\tThe purpose of the applicant rating session is fo +r you,\nthe applicant, to provide a sample of your\t\neffective teach +ing skills.</tag>"; $text =~ s/\s+/ /g; # convert white space to a single space print $text;` [download] The print statement is just there to show what's taken place. edited to match the given example... Of course, this still leaves a space between the tag and the text... dpuu and Ovid both give better ways of doing it... but I wasn't paying attention.	[reply] [d/l]

Back to Seekers of Perl Wisdom