Re: Mojo::DOM parsing question

#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';

use Mojo::DOM;

my $html = '<div class="abstract-content selected" id="enc-abstract">
<p>Secondary bile acids (BAs) and short chain fatty acids (SCFAs), two
major types of bacterial metabolites in the colon...
</p>';

my $dom = Mojo::DOM->new( $html );
foreach my $abstract ( $dom->find('div.abstract-content > p')->each ){
  say $abstract->text;
}
[download]

Prints:

Secondary bile acids (BAs) and short chain fatty acids (SCFAs), two
major types of bacterial metabolites in the colon...
[download]

If this isn't what you want/expect please post (or link to) an HTML file you are working with, and clarify your requirements.

Comment on Re: Mojo::DOM parsing question Select or Download Code

Replies are listed 'Best First'.
Re^2: Mojo::DOM parsing question by Anonymous Monk on Jul 17, 2020 at 07:56 UTC
Thanks very much Marto! So I can understand does the > p specify the text between the `<p>` and `</p>` and the after the 'div.abstract content'? Thanks!! 2020-07-21 Athanasius added code tags.	[reply] [d/l] [select]
Re^3: Mojo::DOM parsing question by marto (Cardinal) on Jul 17, 2020 at 08:02 UTC
This is just a CSS selector `$dom->find('div.abstract-content > p')->each` Here we are doing a search to find each `div`, with class `abstract-content` which has a child `<p>` tag. MDN has a nice section on, CSS Selectors, and plenty of other resources too. To quote Child combinator: "The child combinator (>)* is placed between two CSS selectors. It matches only those elements matched by the second selector that are the direct children of elements matched by the first."* The MDN is linked to in the basics section of the overall Mojo documentation, https://mojolicious.org/perldoc. If you have any other questions let me know.	[reply] [d/l] [select]
Re^2: Mojo::DOM parsing question by Anonymous Monk on Jul 17, 2020 at 08:17 UTC
I seem to be getting stuck trying to define my variable to print based on your help. I'm using: `my $abstr = $dom1->find('div.abstract-content > p')->each;` [download] but gettting an error.	[reply] [d/l]
Re^3: Mojo::DOM parsing question by hippo (Bishop) on Jul 17, 2020 at 08:26 UTC
`each` returns a list but you are calling it in scalar context. Use list context as marto did and you will be fine. See also: Context tutorial. but gettting an error. Don't keep it a secret. Always provide the full text of the error message.	[reply] [d/l]
Re^3: Mojo::DOM parsing question by marto (Cardinal) on Jul 17, 2020 at 08:29 UTC
Is there more than one abstract per html page? My example uses `foreach` to print the text for each match we find, using the selector which matches your requirement. See find. If you know for sure that each page has one abstract you could do something like `my $abstract = $dom->at('div.abstract-content > p')->text;`. If you post the error you have maybe I can provide more help, see also How do I post a question effectively?. In your example `$abstr` will contain the number of matches.	[reply] [d/l] [select]


Think about Loose Coupling
	PerlMonks