perlquestion
bronto
<p>Dearest Monks</p>
<p>I am writing a couple of web-page-scraping tools that will help me in my job seek. I already have something working, but what I am missing is a nice pure perl solution that would format a web page to a nice plain text, so that if an announcement is, for any reason, removed, I still have a chance of getting to the contents</p>
<p>And hence the question: is there anything like <code>lynx -dump</code> in Perl? I dug into CPAN for about half an hour and tried html2text, but it didn't really do a good job...</p>
<readmore>
<p>For the few of you that don't know what lynx is and what it does:</p>
<blockquote><code>
NAME
lynx - a general purpose distributed information browser for
the World Wide Web
...
DESCRIPTION
Lynx is a fully-featured World Wide Web (WWW) client for
users running cursor-addressable, character-cell display
devices (e.g., vt100 terminals, vt100 emulators running on
Windows 95/NT or Macintoshes, or any other "curses-oriented"
display).
...
OPTIONS
...
-dump dumps the formatted output of the default document or
one specified on the command line to standard output.
This can be used in the following way:
lynx -dump http://www.subir.com/lynx.html
</code></blockquote>
</readmore>
<p>Thanks a lot in advance for your help</p>
<!-- Node text goes above. Div tags should contain sig only -->
<div class="pmsig"><div class="pmsig-175325">
<p>Ciao!<br><tt>--bronto</tt></p>
<hr>
<blockquote><small><i>In theory, there is no difference between theory and practice. In practice, there is.</i></small></blockquote>
</div></div>