bronto has asked for the wisdom of the Perl Monks concerning the following question:
Dearest Monks
I am writing a couple of web-page-scraping tools that will help me in my job seek. I already have something working, but what I am missing is a nice pure perl solution that would format a web page to a nice plain text, so that if an announcement is, for any reason, removed, I still have a chance of getting to the contents
And hence the question: is there anything like lynx -dump in Perl? I dug into CPAN for about half an hour and tried html2text, but it didn't really do a good job...
For the few of you that don't know what lynx is and what it does:
NAME lynx - a general purpose distributed information browser for the World Wide Web ... DESCRIPTION Lynx is a fully-featured World Wide Web (WWW) client for users running cursor-addressable, character-cell display devices (e.g., vt100 terminals, vt100 emulators running on Windows 95/NT or Macintoshes, or any other "curses-oriented" display). ... OPTIONS ... -dump dumps the formatted output of the default document or one specified on the command line to standard output. This can be used in the following way: lynx -dump http://www.subir.com/lynx.html
Thanks a lot in advance for your help
Ciao!
--bronto
In theory, there is no difference between theory and practice. In practice, there is.
Back to
Seekers of Perl Wisdom