Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Mojo::DOM parsing question

by marto (Cardinal)
on Jul 17, 2020 at 07:49 UTC ( #11119445=note: print w/replies, xml ) Need Help??


in reply to Mojo::DOM parsing question

#!/usr/bin/perl use strict; use warnings; use feature 'say'; use Mojo::DOM; my $html = '<div class="abstract-content selected" id="enc-abstract"> <p>Secondary bile acids (BAs) and short chain fatty acids (SCFAs), two major types of bacterial metabolites in the colon... </p>'; my $dom = Mojo::DOM->new( $html ); foreach my $abstract ( $dom->find('div.abstract-content > p')->each ){ say $abstract->text; }

Prints:

Secondary bile acids (BAs) and short chain fatty acids (SCFAs), two major types of bacterial metabolites in the colon...

If this isn't what you want/expect please post (or link to) an HTML file you are working with, and clarify your requirements.

Replies are listed 'Best First'.
Re^2: Mojo::DOM parsing question
by Anonymous Monk on Jul 17, 2020 at 07:56 UTC
    Thanks very much Marto! So I can understand does the > p specify the text between the <p> and </p> and the after the 'div.abstract content'? Thanks!!

    2020-07-21 Athanasius added code tags.

      This is just a CSS selector

      $dom->find('div.abstract-content > p')->each

      Here we are doing a search to find each div, with class abstract-content which has a child <p> tag.

      MDN has a nice section on, CSS Selectors, and plenty of other resources too. To quote Child combinator:

      "The child combinator (>) is placed between two CSS selectors. It matches only those elements matched by the second selector that are the direct children of elements matched by the first."

      The MDN is linked to in the basics section of the overall Mojo documentation, https://mojolicious.org/perldoc. If you have any other questions let me know.

Re^2: Mojo::DOM parsing question
by Anonymous Monk on Jul 17, 2020 at 08:17 UTC
    I seem to be getting stuck trying to define my variable to print based on your help. I'm using:
    my $abstr = $dom1->find('div.abstract-content > p')->each;
    but gettting an error.

      each returns a list but you are calling it in scalar context. Use list context as marto did and you will be fine. See also: Context tutorial.

      but gettting an error.

      Don't keep it a secret. Always provide the full text of the error message.

      Is there more than one abstract per html page? My example uses foreach to print the text for each match we find, using the selector which matches your requirement. See find. If you know for sure that each page has one abstract you could do something like my $abstract = $dom->at('div.abstract-content > p')->text;. If you post the error you have maybe I can provide more help, see also How do I post a question effectively?. In your example $abstr will contain the number of matches.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11119445]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2020-10-19 21:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (207 votes). Check out past polls.

    Notices?