Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Passing complex html-tag input over command line to HTML TreeBuilder method look_down() properly

by Anonymous Monk
on Apr 18, 2019 at 07:52 UTC ( [id://1232745]=note: print w/replies, xml ) Need Help??


in reply to Passing complex html-tag input over command line to HTML TreeBuilder method look_down() properly

See example htmltreexpather.pl - xpath helper, creates xpath search strings from html although you ought to simply accept css paths or xpaths instead . HTML::TreeBuilder::Select - Traverse a HTML tree using CSS selectors

HTML::TreeBuilder::LibXML - HTML::TreeBuilder and XPath compatible interface with libxml

  • Comment on Re: Passing complex html-tag input over command line to HTML TreeBuilder method look_down() properly

Replies are listed 'Best First'.
Re^2: Passing complex html-tag input over command line to HTML TreeBuilder method look_down() properly
by sadarax (Sexton) on Apr 18, 2019 at 22:30 UTC
    Thanks for the suggestion. I've looked at them but I'm not really sure how to make them work. Using my browser's devtools, I found specific element I'm wanting to target and did Copy --> Copy Selector. It gave me this:
      body > div.js-amu-container-global > div.gc-page-container > div.gc-container.gc-container--fluid > div > div.layout-2col-content.content-section-padded > div.gc-container > div.comic.container.js-comic-2729847.js-item-init.js-item-share.js-comic-swipe.bg-white.border.rounded > div.comic__wrapper > div.comic__container > div > a > picture
    
    I'm really not sure how to use that. Something like this?
    use HTML::Selector::XPath; my $selector = HTML::Selector::XPath->new("div.comic__container"); $selector->to_xpath;
    I want to parse that piece of HTML code to find the 'data-image' attribute, which I know resides within there very close to that selector point.
      #!/usr/bin/perl -- use strict; use warnings; use HTML::TreeBuilder::XPath; use HTML::Selector::XPath 'selector_to_xpath'; Main( @ARGV ); sub Main { my $tree = HTML::TreeBuilder::XPath->new; # $tree->parse_file('foo.html'); $tree->parse_content( DemoHtml() ); for my $node ( $tree->findnodes( selector_to_xpath( 'div.comic__container' ) ) ) { MeImagins( $node ); } } sub MeImagins { my( $node ) = @_; for my $img( $node->findnodes('//img') ){ print "\n###", "\n", $img->address(), "\n", $img->attr( 'src' ), "\n", $img->attr( 'alt' ), "\n", ; } } sub DemoHtml { return <<'__HTML__'; <div class="comic__container"> <div class="comic__image js-comic-swipe-target"> <div class="swipe-preview swipe-preview__previous js-preview-p +revious"> <div class="swipe-preview__group"> <h5 class="card-subtitle"> <date>April 16, 2019</date> </h5> <div class="swipe-preview__ubadge"> <div class="gc-avatar gc-avatar--creator sm"><img srcset="https: +//assets.gocomics.com/assets/transparent-3eb10792d1f0c7e07e7248273540 +f1952d9a5a2996f4b5df70ab026cd9f05517.png" data-srcset="https://avatar +.amuniversal.com/feature_avatars/ubadge_images/features/cw/small_u-20 +1701251613.png, 72w" class="lazyload" alt="9 Chickweed Lane" src="htt +ps://avatar.amuniversal.com/feature_avatars/ubadge_images/features/cw +/small_u-201701251613.png"></div> </div> </div> </div> <a itemprop="image" class="js-item-comic-link" href="/9chickwe +edlane/2019/04/17" title="9 Chickweed Lane"> <picture class="item-comic-image"><img class="lazyload img-fluid" sr +cset="https://assets.gocomics.com/assets/transparent-3eb10792d1f0c7e0 +7e7248273540f1952d9a5a2996f4b5df70ab026cd9f05517.png" data-srcset="ht +tps://assets.amuniversal.com/93d41d70391d01379025005056a9545d 900w" s +izes=" (min-width: 992px) 900px, (min-width: 768px) 600px, (min-width: 576px) 300px, 900px" alt="9 Chickweed Lane Comic Strip for Ap +ril 17, 2019 " src="https://assets.amuniversal.com/93d41d70391d013790 +25005056a9545d" width="100%"></picture> </a> <meta itemprop="isFamilyFriendly" content="true"> <div class="swipe-preview swipe-preview__next js-preview-next" +> <div class="swipe-preview__group"> <h5 class="card-subtitle"> <date>April 18, 2019</date> </h5> <div class="swipe-preview__ubadge"> <div class="gc-avatar gc-avatar--creator sm"><img srcset="https: +//assets.gocomics.com/assets/transparent-3eb10792d1f0c7e07e7248273540 +f1952d9a5a2996f4b5df70ab026cd9f05517.png" data-srcset="https://avatar +.amuniversal.com/feature_avatars/ubadge_images/features/cw/small_u-20 +1701251613.png, 72w" class="lazyload" alt="9 Chickweed Lane" src="htt +ps://avatar.amuniversal.com/feature_avatars/ubadge_images/features/cw +/small_u-201701251613.png"></div> </div> </div> </div> </div> <nav class="gc-calendar-nav" role="group" aria-label="Date Nav +igation Controls"> <div class="gc-calendar-nav__previous"> <a role="button" href="/9chickweedlane/1993/07/12" class="fa btn + btn-outline-secondary btn-circle fa fa-backward sm " title=""></a> <a role="button" href="/9chickweedlane/2019/04/16" class="fa btn + btn-outline-secondary btn-circle fa-caret-left sm js-previous-comic +" title=""></a> </div> <div class="gc-calendar-nav__select"> <div class="btn btn-outline-secondary gc-calendar-nav__datepicke +r js-calendar-wrapper" data-date="2019/04/17" data-name="/9chickweedl +ane/" data-year="2019" data-month="04" data-day="17" data-feature="9c +hickweedlane" data-ct="" data-start="1993/07/12" data-end="2019/04/19 +" data-open="2019-04-17"> <i class="fa fa-calendar xs"></i> <input name="startDate" placeholder="April 17, 2019" readonl +y="readonly" class="cal off calendar-input date js-calendar-input dat +epicker js-calendar-input-link" type="text"> </div> <a class="btn btn-outline-secondary" alt="Click to View a Random + 9 Chickweed Lane Comic Strip!" href="/random/9chickweedlane">Random< +/a> </div> <div class="gc-calendar-nav__next"> <a role="button" href="/9chickweedlane/2019/04/18" class="fa btn + btn-outline-secondary btn-circle fa-caret-right sm " title=""></a> <a role="button" href="/9chickweedlane/2019/04/19" class="fa btn + btn-outline-secondary btn-circle fa-forward sm " title=""></a> </div> </nav> </div> __HTML__ }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1232745]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-04-23 21:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found