ok, it's a little brutal, and expecially written to convert letters which belong to italian language, but it's fast, recognises images and scripts and after all it dumps plain ASCII!!!
enjoy!
SiG
#!/usr/bin/perl # convert HTML to plain ASCII in a moment! # ok, lynks does it better, but this is less than 1k!!! # enjoy! # baginov@hotmail.com print "Input File:\n"; $input_file = <STDIN>; chop($input_file); open (INF,"< $input_file"); $input_file=~ s/\.htm?./\.txt/; open (OUF,"> $input_file"); while ($riga=<INF>) { $riga =~ s/<.>//g; $riga =~ s/<\/.>//g; $riga =~ s/<\/(script|SCRIPT)>/\-\-\-\-\- Script \-\-\-\-\-\n/g; $riga =~ s/<\/.+>//g; $riga =~ s/<(img|IMG).+>/\n-----------\n\| Image \|\n-----------\n/g +; $riga =~ s/<(script|SCRIPT).+>/\-\-\-\-\- Script \-\-\-\-\-/g; $riga =~ s/<br>/\n/g; $riga =~ s/<.+>//g; $riga =~ s/\ / /g; $riga =~ s/\è/e\'/g; $riga =~ s/\à/a\'/g; $riga =~ s/\ù/u\'/g; $riga =~ s/\ì/i\'/g; $riga =~ s/\é/e\'/g; $riga =~ s/\í/i\'/g; $riga =~ s/\ò/o\'/g; $riga =~ s/\</</g; $riga =~ s/\"/\"/g; print OUF $riga; } close (INF); close (OUF);
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Dump Text from HTML
by OeufMayo (Curate) on Jul 18, 2001 at 14:07 UTC | |
by Sigmund (Pilgrim) on Jul 26, 2001 at 14:05 UTC | |
Re: Dump Text from HTML
by davorg (Chancellor) on Jul 18, 2001 at 13:54 UTC | |
Re: Dump Text from HTML
by alfie (Pilgrim) on Jul 18, 2001 at 12:50 UTC | |
by Sigmund (Pilgrim) on Jul 28, 2001 at 20:06 UTC | |
by Sigmund (Pilgrim) on Jul 22, 2001 at 18:34 UTC | |
by dentargiano (Initiate) on Jul 10, 2002 at 10:10 UTC | |
by dentargiano (Initiate) on Jul 10, 2002 at 10:22 UTC | |
Re: Dump Text from HTML
by Anonymous Monk on Jan 13, 2009 at 17:27 UTC |
Back to
Cool Uses for Perl