Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

HTML content editing using HTML::TreeBuilder

by phoenix007 (Sexton)
on Apr 10, 2019 at 06:37 UTC ( #1232383=perlquestion: print w/replies, xml ) Need Help??

phoenix007 has asked for the wisdom of the Perl Monks concerning the following question:

I want to modify HTML using HTML::TreeBuilder. I was trying to parse plain text and identify urls and modify them with <a hrel="url">url</p>

If I have following line in my HTML :

<p>test1 <a href=""></a> test2</p>

I want to modify it to :

<p>test1 <a href=""></a> <a href=""></a> test2</p>

Here I am facing two problems : I am not getting proper count of child contents so that I can modify them I get total_contents as 1 for p. Second problem is I am not able to insert new HTML::Element inside text of p

  • Comment on HTML content editing using HTML::TreeBuilder

Replies are listed 'Best First'.
Re: HTML content editing using HTML::TreeBuilder
by Tux (Abbot) on Apr 10, 2019 at 07:58 UTC

    edit: now that the OP has removed the code snippet that he/she posted in the original question, my reply makes no sense anymore.

    Looks overly complicated to find a tags where the href matches something

    foreach my $a ($root->look_down (_tag => "a", href => qr{some url matc +h})) { say "Found a tag with href to ", $a->attr ("href"), " and text ", +$a->as_text; }

    and move from there

    Enjoy, Have FUN! H.Merijn
Re: HTML content editing using HTML::TreeBuilder
by skleblan (Sexton) on Apr 10, 2019 at 23:10 UTC

    As Tux mentioned, it would be helpful to see some sample code. I can only offer some general suggestions.

    I will say that it can be tricky when you are using HTML::TreeBuilder to search for "text". In the following example, the visual text that you see displayed is actually split up between different nodes.

    <h2><i>Real-Life</i> Police Investigations</h2>

    This might be related to why you see different numbers of child contents. Another example is

    <div id=1> <div id=2> <h2>Spring 2019 Season</h2> <div id=3> Content...

    Whenever I'm searching for text as it's displayed to the user, I definitely take advantage of the as_text() method of HTML::Element. Maybe check if you are at the deepest/shallowest level by checking if the as_text() result of the current node is the same/different as the as_text() result of the parent/child node.

Re: HTML content editing using HTML::TreeBuilder
by Anonymous Monk on Apr 10, 2019 at 19:16 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1232383]
Approved by haukex
Front-paged by haukex
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (2)
As of 2020-07-13 03:08 GMT
Find Nodes?
    Voting Booth?

    No recent polls found