Not the whitespace, but the newline character.
Newline is whitespace
But both as_text and as_trimmed_text cut this newline. ... Is it possible to preserve the newline?
No they don't. The whitespace is already gone before you call either of those methods.
All you had to do was
$ perldoc HTML::TreeBuilder |grep -i space
Do not represent the text content of elements. This saves spac
+e if
$root->ignore_ignorable_whitespace(value)
whitespace text nodes in the tree. Default is true. (In fact,
+I'd be
$root->no_space_compacting(value)
This determines whether TreeBuilder compacts all whitespace st
+rings
contiguous whitespace in the document is turned into a single
+space.
But that's not done if no_space_compacting is set to 1.
Setting no_space_compacting to 1 might be useful if you want t
+o read
Redirects to HTML::Element:: delete_ignorable_whitespace
$ perldoc HTML::Element |grep -i space
$h->delete_ignorable_whitespace()
whitespace. You should not use this if $h under a 'pre' element.
"\t", or some number of spaces, if you specify it).
whitespace is deleted, and any internal whitespace is collapsed.
This will not remove hard spaces, unicode spaces, or any other non
+ ASCII
white space unless you supplye the extra characters as a string
Tabs are expanded to however many spaces it takes to get to the ne
+xt 8th
#!/usr/bin/perl --
use strict;
use warnings;
use HTML::TreeBuilder;
use Test::More qw' no_plan ';
Main(@ARGV);
exit(0);
sub Main {
is( OneT('<html><body></body></html>'),
undef, 'no tag means undef not empty string' );
is( OneT('<html><title></title><body></body></html>'),
'', 'no content' );
is( OneT('<html><title> </title><body></body></html>'),
' ', 'space' );
is( OneT(qq'<html><title>a\nb</title><body></body></html>'),
"a\nb", 'a newline b' );
} ## end sub Main
sub OneT {
my ( $html, $expect, $name ) = @_;
my $tree = HTML::TreeBuilder->new();
$tree ->no_space_compacting(1);
$tree->parse($html);
return eval { $tree->look_down(qw' _tag title')->as_text };
} ## end sub OneT
__END__
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.