Re: taking white space out between closing and opening tags

in reply to taking white space out between closing and opening tags

Here's a quick, untested, stab at it. Let's assume for this example that you are talking about <p> tags.

use HTML::TokeParser::Simple;

# assumes that $text is a scalar containing the actual HTML
my $p = HTML::TokeParser::Simple->new( \$text );

my $token;
do { $token = $p->get_token } until $token->is_start_tag('p');

my $new_text = $token->return_text;

do ( $token = $p->get_token ) {
    my $temp = $token->return_text;
    if ( $token->is_text ) {
        $temp =~ s/\s+/ /g;  # collapse whitespace
        $temp =~ s/^\s//;    # remove initial whitespace
        $temp =~ s/\s$//;    # remove trailing whitespace
    }
    $new_text .= $temp;
} until $token->is_end_tag('p');

$new_text .= $token->return_text;
[download]

This is a much cleaner method (and accurate) method of accomplishing this task than most regex solutions. I also happen to think that HTML::TokeParser::Simple is easier to use than many other HTML parsing modules. Of course, I may be biased as I wrote that module :)

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Comment on Re: taking white space out between closing and opening tags Download Code

In Section Seekers of Perl Wisdom