http://qs321.pair.com?node_id=578405


in reply to Re^2: Any pure-perl html to text? (Or: missing a perl equivalent to 'lynx -dump')
in thread Any pure-perl html to text? (Or: missing a perl equivalent to 'lynx -dump')

Gosh! You didn't even take a look at what lynx -dump produces, did you?

He didn't claim it would produce the same output, nor comparable one. He just pointed out it has a method for outputting plain text, which it has. Indeed I think it more or less amounts to the as_text() of the whole parse tree of the wanted page. Lynx and its variations are full fledged browser, so it is natural they go beyond the capabilities of a simple parser, aiming at being presentation friendly. But that's quite a lot of work. You may hack/roll your own by inserting horizontal and vertical whitespace suitably around individual elements before printing them as_text. Needless to say, this is necessarily going to be quite a lot of work, but maybe just inserting newlines after every single one of them may make everything more clear. Oh, and at the very least take care of paragraphs and breaks. But if you also want line wrap that's a whole another story. (A call for Text::Wrap, most probably.)

OTOH did you look at the outcome of your post (as is recommended)?!? It screwed up the whole view for this thread. Use <code> tags around the stuff you pasted, although it's not strictly code. At least that has smart line wrap...

Update: the post has been fixed, hence the above comment does not apply any more.

Ciao