Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Re: Keeping bad HTML bad

by trs80 (Priest)
on Aug 24, 2002 at 21:37 UTC ( [id://192590]=note: print w/replies, xml ) Need Help??


in reply to Re: Keeping bad HTML bad
in thread Keeping bad HTML bad

This is a good suggestion, but in my case I am very limited in what I can do for the user as far as the HTML, and all comments are removed (and are to be removed by client request) from all pages processed. I go into some specifics in one of my earlier replies, but to rephrase and recap what I am doing:

  • Retrieve remote document via HTTP ( LWP::UserAgent, HTTP::Request )
  • Parse document for local storage and confirm that it's format isn't horribly disgusting ( HTML::TreeBuilder )
  • Allow editing of title tag, meta tags, anchor tag title attribute, and img tag alt attribute.
The forms for the editing are created by relying on where each tag is located inside of the element array created by HTML::TreeBuilder. That is if a person selects alt tags as way they want to edit each img tag is located using the look_down method in an array context:
my @img = $tree->look_down('_tag', 'img'); my $count; my $form; foreach my $element (@img) { # make a form element $form .= # call to CGI function, name = "img-$count" $count++; } return $form;
Then when they submit the form the $count is referenced and the appropriate img tags alt content is replaced.

But this is all moot since the issue was and is that HTML::TreeBuilder is "supposed" to handle bad HTML, since it uses HTML::Parser and one of the goals of HTML::Parser is to work with documents that are really out there, the example given should work with HTML::TreeBuilder and in fact it does, part of my problem was not turning off implicit_tags as one of my other replies above states. The implicit_tags is unique to the HTML::TreeBuilder module and it attempts to correct badly formated HTML, which 98% of the time is most likely a good thing, but at least the author designed in the ability to turn off that behavior in the 2% of the times it isn't a good thing.

I have tested my ideas and have confirmed that setting that flag allows for the conditions I need, but results in a different anomaly, which I have contacted the author of the module about.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://192590]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (5)
As of 2024-04-19 06:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found