in reply to Re: clean html tags
in thread clean html tags
...you might consider the standalone version of Tidy for html...I agree. It's very nifty indeed, easy to use and highly configurable.
Something as simple as:
can be easily adapted to process a list or even a local copy of a web site. The config file can be tweaked to be severe or lenient to taste.my $in = 'bad.html'; my $out = 'tidied.html'; my $err = 'tidy.err'; my $cnf = 'tidy.cnf'; system( 'tidy.exe', '-asxml', -config => $cnf, -file => $err, -output => $out, $in, );
You can easily interrogate all the error files to get a good picture of how bad the html is (and there is a lot of it about!).
Again, I agree. Why bother to go to a lot of trouble when there is a very clever bit of kit available.
In Section
Seekers of Perl Wisdom