http://qs321.pair.com?node_id=578396

bronto has asked for the wisdom of the Perl Monks concerning the following question:

Dearest Monks

I am writing a couple of web-page-scraping tools that will help me in my job seek. I already have something working, but what I am missing is a nice pure perl solution that would format a web page to a nice plain text, so that if an announcement is, for any reason, removed, I still have a chance of getting to the contents

And hence the question: is there anything like lynx -dump in Perl? I dug into CPAN for about half an hour and tried html2text, but it didn't really do a good job...

For the few of you that don't know what lynx is and what it does:

NAME lynx - a general purpose distributed information browser for the World Wide Web ... DESCRIPTION Lynx is a fully-featured World Wide Web (WWW) client for users running cursor-addressable, character-cell display devices (e.g., vt100 terminals, vt100 emulators running on Windows 95/NT or Macintoshes, or any other "curses-oriented" display). ... OPTIONS ... -dump dumps the formatted output of the default document or one specified on the command line to standard output. This can be used in the following way: lynx -dump http://www.subir.com/lynx.html

Thanks a lot in advance for your help

Ciao!
--bronto


In theory, there is no difference between theory and practice. In practice, there is.