http://qs321.pair.com?node_id=1232383

phoenix007 has asked for the wisdom of the Perl Monks concerning the following question:

I want to modify HTML using HTML::TreeBuilder. I was trying to parse plain text and identify urls and modify them with <a hrel="url">url</p>

If I have following line in my HTML :

<p>test1 www.google.com <a href="www.google.com">www.google.com</a> test2</p>

I want to modify it to :

<p>test1 <a href="www.google.com">www.google.com</a> <a href="www.google.com">www.google.com</a> test2</p>

Here I am facing two problems : I am not getting proper count of child contents so that I can modify them I get total_contents as 1 for p. Second problem is I am not able to insert new HTML::Element inside text of p

  • Comment on HTML content editing using HTML::TreeBuilder

Replies are listed 'Best First'.
Re: HTML content editing using HTML::TreeBuilder
by Tux (Canon) on Apr 10, 2019 at 07:58 UTC

    edit: now that the OP has removed the code snippet that he/she posted in the original question, my reply makes no sense anymore.

    Looks overly complicated to find a tags where the href matches something

    foreach my $a ($root->look_down (_tag => "a", href => qr{some url matc +h})) { say "Found a tag with href to ", $a->attr ("href"), " and text ", +$a->as_text; }

    and move from there


    Enjoy, Have FUN! H.Merijn
Re: HTML content editing using HTML::TreeBuilder
by skleblan (Sexton) on Apr 10, 2019 at 23:10 UTC

    As Tux mentioned, it would be helpful to see some sample code. I can only offer some general suggestions.

    I will say that it can be tricky when you are using HTML::TreeBuilder to search for "text". In the following example, the visual text that you see displayed is actually split up between different nodes.

    <h2><i>Real-Life</i> Police Investigations</h2>

    This might be related to why you see different numbers of child contents. Another example is

    <div id=1> <div id=2> <h2>Spring 2019 Season</h2> <div id=3> Content...

    Whenever I'm searching for text as it's displayed to the user, I definitely take advantage of the as_text() method of HTML::Element. Maybe check if you are at the deepest/shallowest level by checking if the as_text() result of the current node is the same/different as the as_text() result of the parent/child node.

Re: HTML content editing using HTML::TreeBuilder
by Anonymous Monk on Apr 10, 2019 at 19:16 UTC