Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
package Apache::Tidy; use strict; use warnings; use vars qw($VERSION); $VERSION = "0.1"; use Apache::Constants qw(OK DECLINED NOT_FOUND); use Apache::File; sub handler { my $r = shift; # we only care about html return DECLINED unless $r->content_type eq 'text/html'; my $fh = undef; if (lc $r->dir_config('Filter') eq 'on') { # register as a filter $r = $r->filter_register; # get input from any previous filters ($fh,my $status) = $r->filter_input; return $status unless $status == OK; } else { $fh = Apache::File->new($r->filename); return DECLINED unless $fh; } my $dirty = do {local $/; <$fh>}; my $tidy_path = $r->dir_config('TidyPath') || "/usr/bin/tidy"; my $temp_dir = $r->dir_config('TidyTempDir') || "/tmp"; my $options = join ' ', $r->dir_config->get('TidyOptions'); $options = $options || "-q -asxhtml"; $r->send_http_header('text/html'); # clean up the path so we can run in taint mode delete $ENV{PATH}; eval { # write a tempfile open(TMP,">$temp_dir/tidy_$$.html") or die "couldn't write to tempfile: $!"; print TMP $dirty; close TMP; # run tidy over it system("$tidy_path $options $temp_dir/tidy_$$.html > $temp_dir +/tidy_out_$$.html"); # read in results open(OUT,"<$temp_dir/tidy_out_$$.html") or die "couldn't read tempfile: $!"; my @results = <OUT>; close OUT; # clean up unlink "$temp_dir/tidy_$$.html"; unlink "$temp_dir/tidy_out_$$.html"; print @results; }; if ($@) { # if something generated an error, # we default to just passing the content on unchanged. print $dirty; } return OK; } 1; __END__ =head1 NAME Apache::Tidy - htmltidy as an apache filter =head1 SYNOPSIS PerlModule Apache::Filter PerlModule Apache::Tidy <Location /filtered/*.html> SetHandler perl-script PerlHandler Apache::Tidy </Location> =head1 ABSTRACT Cleans up and fixes invalid HTML on the fly. =head1 DESCRIPTION Wrapper for the htmltidy program (L<http://tidy.sourceforge.net/>) usi +ng the Apache::Filter framework. Fixes HTML/XHTML validation issues on the fly. Dave Raggett's HTML Tidy is a free command-line utility for cleaning up messy and invalid HTML or XHTML code. It will correct missing or mismatched end tags, clean up Microsoft Word generated HTML, convert pages to XHTML, and format markup for easier reading. Apache::Tidy uses the Apache::Filter framework to allow you to automatically run tidy over web content as it is being served. This can be very useful if you have editors or CMSes that produce invalid markup. To filter static content add the following to your httpd.conf: PerlModule Apache::Tidy <Location /directory/to/filter/> SetHandler perl-script PerlHandler Apache::Tidy </Location> Apache::Tidy can also work as part of an Apache::Filter chain: PerlModule Apache::Filter PerlModule Apache::RegistryFilter PerlModule Apache::Tidy <Location /perl/*.pl> PerlSetVar Filter On SetHandler perl-script PerlHandler Apache::RegistryFilter Apache::Tidy </Location> Apache::Tidy supports all of htmltidy's command-line options by setting TidyOptions: <Location /filtered/> SetHandler perl-script PerlHandler Apache::Tidy PerlSetVar TidyOptions '-wrap 60' PerlSetVar TidyOptions -clean PerlSetVar TidyOptions -asxhtml </Location> It defaults to '-q -asxhtml' if no options are explicitly set. You can also specify a different path to the tidy executable (necessary if you've installed it anywhere but in /usr/bin/) and the temp directory used can also be specified (defaults to /tmp): <Location /filtered/> SetHandler perl-script PerlHandler Apache::Tidy PerlSetVar TidyPath /opt/local/bin/tidy PerlSetVar TidyTempDir /some/other/temp/dir </Location> =head1 NOTES You must have htmltidy installed on your system. if it is installed anywhere other than in /usr/bin/, you'll have to specify the full path with PerlSetVar TidyPath /path/to/tidy I've only tested Apache::Tidy on unix systems. It may run on other platforms, but you will probably have to change the path, temp directory, and options. Since Apache::Tidy just jumps out to the shell to call the external tidy program, it probably isn't very efficient. I'd like to reimplement this someday with an XS or SWIG wrapped tidylib. =head1 SEE ALSO L<Apache::Filter>, L<http://tidy.sourceforge.net/>, L<Apache::Registry +Filter> =head1 AUTHOR Anders Pearson, E<lt>anders@columbia.eduE<gt> =head1 COPYRIGHT AND LICENSE Copyright 2003 by Anders Pearson This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut

In reply to Apache::Tidy by thraxil

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others examining the Monastery: (7)
    As of 2021-02-25 13:24 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found

      Notices?