use warnings; use strict; print html_abstract(<<'END_HTML', 200), "\n";

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed tristique purus urna, a lacinia nulla euismod et. Pellentesque tempus et justo faucibus. Fusce scelerisque, magna efficitur congue, leo nibh volutpat nibh, ac mattis dolor ipsum sit amet quam. Suspendisse eleifend id ligula quis placerat. Pellentesque fermentum eu magna sed mollis. Quisque placerat efficitur blandit. Vestibulum non.

END_HTML use Mojo::DOM; sub html_abstract { my ($html, $remain) = @_; my $walk; $walk = sub { my ($in, $out) = @_; for my $n ( @{ $in->child_nodes } ) { last unless $remain; if ( $n->type eq 'cdata' || $n->type eq 'text' ) { my $txt = $n->content; if ( length $txt < $remain ) { $out->append_content($txt); $remain -= length $txt; } else { $txt =~ /^(.{0,$remain}\b)/s; $out->append_content("$1..."); $remain = 0; } } elsif ( $n->type eq 'tag' ) { my $t = $out->new_tag( $n->tag, %{ $n->attr } ) # new_tag gives us a "root", but we want the tag ->child_nodes->first; $walk->($n, $t); $out->append_content($t); } # ignore other node types for now } return $out; }; return $walk->(Mojo::DOM->new($html), Mojo::DOM->new)->to_string; } __END__

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed tristique purus urna, a lacinia nulla euismod et. Pellentesque tempus et justo faucibus. Fusce scelerisque, magna efficitur congue, leo ...