Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Mojo::DOM help

by Anonymous Monk
on May 20, 2020 at 04:44 UTC ( [id://11116959]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello wise monks, I'm trying to parse several html pages but their format is different in that the info that I want is after three different class tags. I can find one tag and get the data I want just fine the problem is with the multiple tags. I can return values related to anyone but not all three which is what I want. I tried to use the "each" and "match" in mojo with no luck. I also tried looking for a null string value so I wouldn't be overwriting the mapping if I had already found what I needed after matching one tag and extracting the values. Anyway here's the code that I though could work:
$r2 = $dom2->find( '[class="LC20lb DKV0Md"]' or '[class="BNeawe vvjw +Jb AP7Wnd"]' or '[class="CVA68e qXLe6d"]' ) -> map( sub{ $_->text } ) +;
Any ideas on how to pull out the data I need behind the 3 tags and map so I can write should be greatly appreciated. Thanks, Newbmyer

Replies are listed 'Best First'.
Re: Mojo::DOM help
by haukex (Archbishop) on May 20, 2020 at 07:01 UTC

    You're using Perl's or, which is taking the logical combination of those three strings before calling the find method, and since the first string is a true value, the only thing you're passing to the find method is the first string. See Mojo::DOM::CSS's selectors: "or" is a comma, and "and" for attributes is [...][...]. But note that class gets special treatment, and you can simply use the .class selector to match classes, stringing them together for an "and". Also note the order of classes in the class attribute can change, which the following shows, but if you really want exact string matches you can use the [class="value"] selectors.

    use warnings; use 5.012; use Mojo::DOM; my $dom = Mojo::DOM->new(<<'HTML'); <div> <div class="LC20lb DKV0Md"> matches </div> <div class="DKV0Md"> no match </div> <div class="DKV0Md LC20lb"> matches </div> <div class="BNeawe vvjwJb AP7Wnd"> matches </div> <div class="BNeawe AP7Wnd vvjwJb"> matches </div> <div class="AP7Wnd vvjwJb BNeawe"> matches </div> <div class="BNeawe AP7Wnd"> no match </div> <div class="CVA68e qXLe6d"> matches </div> <div class="qXLe6d CVA68e"> matches </div> <div class="qXLe6d"> no match </div> </div> HTML $dom->find('.LC20lb.DKV0Md, .BNeawe.vvjwJb.AP7Wnd, .CVA68e.qXLe6d') ->each(sub { say });
Re: Mojo::DOM help
by marto (Cardinal) on May 20, 2020 at 07:05 UTC

    N.B. Google change the SERPs stuff every once in a while, and blacklist the IP of systems the believe are scraping them.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11116959]
Approved by kcott
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-25 07:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found