dfaure has asked for the wisdom of the Perl Monks concerning the following question:
Hi,
I'm having an annoying issue with WWW::Mechanize::Firefox xpath() method. The following code...
#!perl -w
use strict;
use WWW::Mechanize::Firefox;
use Data::Dumper;
my $mech = WWW::Mechanize::Firefox->new(activate => 1);
$mech->autoclose_tab(0);
$mech->update_html(<<'HTML');
<html>
<head>
<title>Hello Firefox!</title>
</head>
<body>
<h1>Hello <b>World</b>!</h1>
<p id='paragraph'>Hello <b>WWW::Mechanize::Firefox</b> Goob bye</p>
</body>
</html>
HTML
test_xpath($mech, '//p', all => 1);
test_xpath($mech, '//p/text()', all => 1);
test_xpath($mech, 'substring(//p,1,4)', all => 1); # expected String:
+Hell
test_xpath($mech, 'string-length(//p)', all => 1); # expected Number:
+38
sub test_xpath {
my ($mech, $xpq, %opts) = @_;
my @xpr;
eval { @xpr = $mech->xpath($xpq, %opts); };
my %results = (
query => $xpq,
exception => $@,
innerHTML => scalar(@xpr) ? [ map { $_->{innerHTML} } @xpr ] :
+ undef,
textContent => scalar(@xpr) ? [ map { $_->{textContent} } @xpr ] :
+ undef,
nodeValue => scalar(@xpr) ? [ map { $_->{nodeValue} } @xpr ] :
+ undef
);
print Data::Dumper->Dump([\%results], ['results']);
}
...shows that (sadly for now in v0.71) not all the Xpath results are handled by this nevertheless amazing module.
$results = {
'nodeValue' => [
undef
],
'exception' => '',
'query' => '//p',
'textContent' => [
'Hello WWW::Mechanize::Firefox Goob by
+e'
],
'innerHTML' => [
'Hello <b>WWW::Mechanize::Firefox</b> Go
+ob bye'
]
};
$results = {
'nodeValue' => [
'Hello ',
' Goob bye'
],
'exception' => '',
'query' => '//p/text()',
'textContent' => [
'Hello ',
' Goob bye'
],
'innerHTML' => [
undef,
undef
]
};
$results = {
'nodeValue' => undef,
'exception' => 'MozRepl::RemoteObject: TypeError: The exp
+ression cannot be converted to return the specified type. at mech.pl
+line 28.
',
'query' => 'substring(//p,1,4)',
'textContent' => undef,
'innerHTML' => undef
};
$results = {
'nodeValue' => undef,
'exception' => 'MozRepl::RemoteObject: TypeError: The exp
+ression cannot be converted to return the specified type. at mech.pl
+line 28.
',
'query' => 'string-length(//p)',
'textContent' => undef,
'innerHTML' => undef
};
I would be very interested with any workaround.
____
HTH, Dominique
My two favorites:
If the only tool you have is a hammer, you will see every problem as a nail. --Abraham Maslow
Bien faire, et le faire savoir...
Re: Some issues with WWW::Mechanize::Firefox->xpath() method
by Corion (Patriarch) on Apr 02, 2013 at 10:57 UTC
|
I would assume that the substring() function wants a string and not a node, and thus one would need to use //p/text() to get at the node text if that is what's wanted.
The code of WWW::Mechanize::Firefox basically passes XPath queries straight through to Firefox, so if there is a Javascript error raised by the XPath method, that error most likely comes directly from Firefox itself.
| [reply] [d/l] |
|
print for $dom->findnodes(q{ substring( //p/text() , 0, 4 ) });
print for $dom->findnodes(q{ substring(string(//title),1,4) });
print for $dom->findnodes(q{ substring(string(//title/text()),1,4) });
| [reply] [d/l] |
|
| [reply] |
|
function(doc,q,ref,cont) {
var xres = doc.evaluate(q,ref,null,XPathResult.ORDERED_NODE_SN
+APSHOT_TYPE, null );
var map;
if( cont ) {
map = cont;
} else {
// Default is identity
map = function(e){ return e };
};
var res = [];
for ( var i=0 ; i < xres.snapshotLength; i++ )
{
res.push( map(xres.snapshotItem(i)));
};
return res
}
I'm no expert on XPath and its semantics, but if somebody submits a bug report and preferrably a self-contained example, I can investigate things closer. | [reply] [d/l] [select] |
|
|
| [reply] |
Re: Some issues with WWW::Mechanize::Firefox->xpath() method (ctrl+shift+k, stringValue)
by Anonymous Monk on Apr 02, 2013 at 11:51 UTC
|
FWIW, if you fireup firefox web console with ctrl+shift+k and execute
document.evaluate( 'substring(//title,1,9)', document , null , XPathRe
+sult.ANY_TYPE , null )
you will get a [object XPathResult], and if you click on it, there will be a stringValue: "Comment o"
So you need Dump-er earlier and pick the right key/attribute
Found by grep for xpath in mechanize-firefox/mozrepl, lookup https://developer.mozilla.org/en-US/docs/DOM/document.evaluate and https://developer.mozilla.org/en-US/docs/Introduction_to_using_XPath_in_JavaScript | [reply] [d/l] [select] |
|
Ah hah!
The difference in behaviour is caused by WWW::Mechanize::Firefox / MozRepl::RemoteObject using
XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, while your code uses XPathResult.ANY_TYPE. I'm not certain about whether ANY_TYPE will guarantee an ordered snapshot, which I consider important, as I'd like the nodes to appear in "document order" in the result, and I'd like them to remain unchanged from the time the snapshot was taken, because there is transfer latency between Firefox and Perl. The documentation talks about nodes, so it seems that there is no way to get an ordered snapshot with strings...
I don't see an easy way to automatically determine the "natural" result type of an expression, so in the middle term, MozRepl::RemoteObject::Methods::xpath needs to also take the result type as an (optional) parameter. Then, the Firefox ->xpath API can be extended to allow specifying the kind of result.
| [reply] [d/l] [select] |
|
The documentation talks about nodes, so it seems that there is no way to get an ordered snapshot with strings...
If this could help, from previous running C++ code dealing directly with the XPCom layer, we found that, while using the ANY_TYPE:
- The resulting elements have always been returned (as expected) in the document order (aka a depth first tree walk).
- Even if not predictable at the query time, the exact result type, is driven by the expression query elements (btw, it would be nice to have it returned, to prevent recomputing it from query analysis).
____
HTH, Dominique
My two favorites:
If the only tool you have is a hammer, you will see every problem as a nail. --Abraham Maslow
Bien faire, et le faire savoir...
| [reply] |
Re: Some issues with WWW::Mechanize::Firefox->xpath() method
by Loops (Curate) on Apr 02, 2013 at 12:10 UTC
|
@ret = $mech->xpath('substring("hello",1,4)', { single => 1 } )
Returns an empty array. It may be that you simply have to return the full text node to perl and then perform the substring in perl code. | [reply] [d/l] |
Re: Some issues with WWW::Mechanize::Firefox->xpath() method (xpath 1.0)
by Anonymous Monk on Apr 02, 2013 at 10:11 UTC
|
test_xpath($mech, 'substring(//p,1,4)', all => 1); # expected String: Hell
That is not valid xpath syntax, what documentation are you reading?
| [reply] |
|
$ xsh
$scratch/> insert element p into /scratch
$scratch/> insert text 'Hello world' into /scratch/p
$scratch/> echo substring(//p,1,4)
Hell
| [reply] [d/l] |
|
Works for me (using XML::XSH2):
And you're sure that's not an xsh2 function/feature?
Produces no output for me
#!/usr/bin/perl --
use strict; use warnings; use XML::LibXML;
my $html = <<'__HTML__';
<html>
<head>
<title>Hello Firefox!</title>
</head>
<body>
<h1>Hello <b>World</b>!</h1>
<p id='paragraph'>Hello <b>WWW::Mechanize::Firefox</b> Goob bye</p>
</body>
</html>
__HTML__
my $dom = XML::LibXML->new(
qw/ recover 2 /
)->load_html(
#~ string => \$html, ## BUG! scalar... not like load_xml
string => $html,
);
local $\ = $/;
print for $dom->findnodes(q{ substring(//p,1,4) }); ## nada
print for $dom->findnodes(q{ //p }); ## paragraph
| [reply] [d/l] |
|
|
|
|
| [reply] |
|
| [reply] |
|
|