Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

taking white space out between closing and opening tags

by chuleto1 (Beadle)
on Aug 21, 2002 at 20:41 UTC ( [id://191864]=perlquestion: print w/replies, xml ) Need Help??

chuleto1 has asked for the wisdom of the Perl Monks concerning the following question:

Problem:
I would like to delete all tab space "\t" new lines "\n " and more than one continuos space " " inside opening and closing tags and replacing them with ONE space per instance.
$text = "<tag>
      The purpose of the applicant rating session is for you,
the applicant, to provide a sample of your       
effective teaching skills.</tag>"


The desired result would be:

$text = "<tag>The purpose of the applicant rating session is for you, the applicant, to provide a sample of your effective teaching skills.</tag>"

Replies are listed 'Best First'.
Re: taking white space out between closing and opening tags
by Ovid (Cardinal) on Aug 21, 2002 at 20:58 UTC

    Here's a quick, untested, stab at it. Let's assume for this example that you are talking about <p> tags.

    use HTML::TokeParser::Simple; # assumes that $text is a scalar containing the actual HTML my $p = HTML::TokeParser::Simple->new( \$text ); my $token; do { $token = $p->get_token } until $token->is_start_tag('p'); my $new_text = $token->return_text; do ( $token = $p->get_token ) { my $temp = $token->return_text; if ( $token->is_text ) { $temp =~ s/\s+/ /g; # collapse whitespace $temp =~ s/^\s//; # remove initial whitespace $temp =~ s/\s$//; # remove trailing whitespace } $new_text .= $temp; } until $token->is_end_tag('p'); $new_text .= $token->return_text;

    This is a much cleaner method (and accurate) method of accomplishing this task than most regex solutions. I also happen to think that HTML::TokeParser::Simple is easier to use than many other HTML parsing modules. Of course, I may be biased as I wrote that module :)

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: taking white space out between closing and opening tags
by dpuu (Chaplain) on Aug 21, 2002 at 20:50 UTC
    split the problem: first get the string, then condense it. Assuming you can't use any of te std XML/HTML modules to get the text, you could try:
    sub condense { $_[0] =~ s/\s+/ /g } $in =~ s/(<tag>)(.*?)(<\/tag>)/ $1 . condense($2) . $3 /ge;
    --Dave
Re: taking white space out between closing and opening tags
by Mr. Muskrat (Canon) on Aug 21, 2002 at 20:51 UTC
    I'll help with the regex requirements.
    #/usr/bin/perl -w use strict; my $text = "<tag>\n\tThe purpose of the applicant rating session is fo +r you,\nthe applicant, to provide a sample of your\t\neffective teach +ing skills.</tag>"; $text =~ s/\s+/ /g; # convert white space to a single space print $text;
    The print statement is just there to show what's taken place.
    edited to match the given example...
    Of course, this still leaves a space between the tag and the text... dpuu and Ovid both give better ways of doing it... but I wasn't paying attention.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://191864]
Approved by Mr. Muskrat
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2024-03-28 12:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found