http://qs321.pair.com?node_id=578457


in reply to Any pure-perl html to text? (Or: missing a perl equivalent to 'lynx -dump')

I tend to do these things by hand, even though I know I really shouldn't.
my $string = "..htmlstuff.."; # strip out newlines $string =~ s/[\r\n]+/ /sg; # replace <p> with custom paragraph marker my $marker_paragraph = "**PARAGRAPHHERE**"; $string =~ s/<p(\s[^>]*)?>/$marker_paragraph/isg; # remove all HTML tags $string =~ s/<[^>]*>//sg; # replace custom paragraph marker with blank line $string =~ s/\Q$marker_paragraph\E/\n\n/sg;

You can add other transforms, such as wrapping at a particular column etc.