Re: Using HTML::TreeBuilder to change the DOCTYPE declaration

by ikegami (Pope)
on Nov 04, 2008 at 11:20 UTC

in reply to Using HTML::TreeBuilder to change the DOCTYPE declaration

The doctype is stored as an attribute of the root element where the attribute name is "_decl" and the value is an HTML::Element object, so basically, you want

my $ele = $root->look_down( _decl => ...was specified..., ) or die qq{declaration not found\n}; my $dec = $ele->attr('_decl');

Since look_down doesn't allow us to check if an attribute was specified we'll have to provide our own handler.

my $ele = $root->look_down( sub { $_[0]->attr('_decl') } ) or die qq{declaration not found\n}; my $dec = $ele->attr('_decl');

But why use look_down at all? The only possible node it could return is the root node. The above code boils down to

my $dec = $root->attr('_decl') or die qq{declaration not found\n};

Now that we have the declaration, let's move on to changing it. It makes no sense to use splice_content to modify attributes. attr is the proper method.

$dec->attr(text => 'DOCTYPE html PUBLIC "HTML4"');

Since the entire purpose is to replace the declaration, let's create a new declaration rather than dying if it's absent.

$root->attr('_decl', HTML::Element->new('~declaration', text => 'DOCTYPE html PUBLIC "HTML4"', ) );

All together:

#!/usr/bin/perl use warnings; use strict; use HTML::TreeBuilder; my $content = do{local $/;<DATA>}; my $root = HTML::TreeBuilder->new_from_content($content); $root->attr('_decl', HTML::Element->new('~declaration', text => 'DOCTYPE html PUBLIC "HTML4"', ) ); print $root->as_HTML; __DATA__ <!DOCTYPE html PUBLIC "XHTML"> <html> <head><title>declaration</title></head> <body><p>declaration</p></body> </html>
<!DOCTYPE html PUBLIC "HTML4"> <html><head><title>declaration</title></head><body><p>declaration</bod +y></html>

