Some issues with WWW::Mechanize::Firefox->xpath() method

dfaure has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm having an annoying issue with WWW::Mechanize::Firefox xpath() method. The following code...

#!perl -w
use strict;
use WWW::Mechanize::Firefox;
use Data::Dumper;

my $mech = WWW::Mechanize::Firefox->new(activate => 1);
$mech->autoclose_tab(0);
$mech->update_html(<<'HTML');
<html>
<head>
<title>Hello Firefox!</title>
</head>
<body>
<h1>Hello <b>World</b>!</h1>
<p id='paragraph'>Hello <b>WWW::Mechanize::Firefox</b> Goob bye</p>
</body>
</html>
HTML

test_xpath($mech, '//p', all => 1);
test_xpath($mech, '//p/text()', all => 1);
test_xpath($mech, 'substring(//p,1,4)', all => 1); # expected String: 
+Hell
test_xpath($mech, 'string-length(//p)', all => 1); # expected Number: 
+38

sub test_xpath {
  my ($mech, $xpq, %opts) = @_;
  my @xpr;
  eval { @xpr = $mech->xpath($xpq, %opts); };
  my %results = (
    query       => $xpq,
    exception   => $@,
    innerHTML   => scalar(@xpr) ? [ map { $_->{innerHTML}   } @xpr ] :
+ undef,
    textContent => scalar(@xpr) ? [ map { $_->{textContent} } @xpr ] :
+ undef,
    nodeValue   => scalar(@xpr) ? [ map { $_->{nodeValue}   } @xpr ] :
+ undef
  );
  print Data::Dumper->Dump([\%results], ['results']);
}
[download]

...shows that (sadly for now in v0.71) not all the Xpath results are handled by this nevertheless amazing module.

$results = {
             'nodeValue' => [
                              undef
                            ],
             'exception' => '',
             'query' => '//p',
             'textContent' => [
                                'Hello WWW::Mechanize::Firefox Goob by
+e'
                              ],
             'innerHTML' => [
                              'Hello <b>WWW::Mechanize::Firefox</b> Go
+ob bye'
                            ]
           };
$results = {
             'nodeValue' => [
                              'Hello ',
                              ' Goob bye'
                            ],
             'exception' => '',
             'query' => '//p/text()',
             'textContent' => [
                                'Hello ',
                                ' Goob bye'
                              ],
             'innerHTML' => [
                              undef,
                              undef
                            ]
           };
$results = {
             'nodeValue' => undef,
             'exception' => 'MozRepl::RemoteObject: TypeError: The exp
+ression cannot be converted to return the specified type. at mech.pl 
+line 28.
',
             'query' => 'substring(//p,1,4)',
             'textContent' => undef,
             'innerHTML' => undef
           };
$results = {
             'nodeValue' => undef,
             'exception' => 'MozRepl::RemoteObject: TypeError: The exp
+ression cannot be converted to return the specified type. at mech.pl 
+line 28.
',
             'query' => 'string-length(//p)',
             'textContent' => undef,
             'innerHTML' => undef
           };
[download]

I would be very interested with any workaround.

____
HTH, Dominique
My two favorites:
If the only tool you have is a hammer, you will see every problem as a nail. --Abraham Maslow
Bien faire, et le faire savoir...

Comment on Some issues with WWW::Mechanize::Firefox->xpath() method Select or Download Code

Replies are listed 'Best First'.
Re: Some issues with WWW::Mechanize::Firefox->xpath() method by Corion (Patriarch) on Apr 02, 2013 at 10:57 UTC
I would assume that the substring() function wants a string and not a node, and thus one would need to use `//p/text()` to get at the node text if that is what's wanted. The code of WWW::Mechanize::Firefox basically passes XPath queries straight through to Firefox, so if there is a Javascript error raised by the XPath method, that error most likely comes directly from Firefox itself.	[reply] [d/l]
Re^2: Some issues with WWW::Mechanize::Firefox->xpath() method by Anonymous Monk on Apr 02, 2013 at 11:12 UTC
FWIW, I tried with LibXML but I can't get no results with these queries either `print for $dom->findnodes(q{ substring( //p/text() , 0, 4 ) }); print for $dom->findnodes(q{ substring(string(//title),1,4) }); print for $dom->findnodes(q{ substring(string(//title/text()),1,4) });` [download]	[reply] [d/l]
Re^2: Some issues with WWW::Mechanize::Firefox->xpath() method by dfaure (Chaplain) on Apr 02, 2013 at 12:23 UTC
The code of WWW::Mechanize::Firefox basically passes XPath queries straight through to Firefox, so if there is a Javascript error raised by the XPath method, that error most likely comes directly from Firefox itself. Could'nt the issue come from the js glue between Perl and Firefox, being unable to deal with XPathResults different from nodes? ____ HTH, Dominique My two favorites: If the only tool you have is a hammer, you will see every problem as a nail. --Abraham Maslow Bien faire, et le faire savoir...	[reply]
Re^3: Some issues with WWW::Mechanize::Firefox->xpath() method by Corion (Patriarch) on Apr 02, 2013 at 12:30 UTC
The JS glue just passes along whatever `document.evaluate` returns, after converting that array-like XPathResults list into a plain array: `function(doc,q,ref,cont) { var xres = doc.evaluate(q,ref,null,XPathResult.ORDERED_NODE_SN +APSHOT_TYPE, null ); var map; if( cont ) { map = cont; } else { // Default is identity map = function(e){ return e }; }; var res = []; for ( var i=0 ; i < xres.snapshotLength; i++ ) { res.push( map(xres.snapshotItem(i))); }; return res }` [download] I'm no expert on XPath and its semantics, but if somebody submits a bug report and preferrably a self-contained example, I can investigate things closer.	[reply] [d/l] [select]
Re^4: Some issues with WWW::Mechanize::Firefox->xpath() method by dfaure (Chaplain) on Apr 02, 2013 at 13:10 UTC
Re^3: Some issues with WWW::Mechanize::Firefox->xpath() method (check your code) by Anonymous Monk on Apr 02, 2013 at 12:42 UTC
I think its your code, instead of Dumpering %results dump @xpr and look for stringValue	[reply]
Re: Some issues with WWW::Mechanize::Firefox->xpath() method (ctrl+shift+k, stringValue) by Anonymous Monk on Apr 02, 2013 at 11:51 UTC
FWIW, if you fireup firefox web console with ctrl+shift+k and execute `document.evaluate( 'substring(//title,1,9)', document , null , XPathRe +sult.ANY_TYPE , null )` [download] you will get a `[object XPathResult]`, and if you click on it, there will be a stringValue: "Comment o" So you need Dump-er earlier and pick the right key/attribute Found by grep for xpath in mechanize-firefox/mozrepl, lookup https://developer.mozilla.org/en-US/docs/DOM/document.evaluate and https://developer.mozilla.org/en-US/docs/Introduction_to_using_XPath_in_JavaScript	[reply] [d/l] [select]
Re^2: Some issues with WWW::Mechanize::Firefox->xpath() method (ctrl+shift+k, stringValue) by Corion (Patriarch) on Apr 02, 2013 at 12:39 UTC
Ah hah! The difference in behaviour is caused by WWW::Mechanize::Firefox / MozRepl::RemoteObject using `XPathResult.ORDERED_NODE_SNAPSHOT_TYPE`, while your code uses `XPathResult.ANY_TYPE`. I'm not certain about whether `ANY_TYPE` will guarantee an ordered snapshot, which I consider important, as I'd like the nodes to appear in "document order" in the result, and I'd like them to remain unchanged from the time the snapshot was taken, because there is transfer latency between Firefox and Perl. The documentation talks about nodes, so it seems that there is no way to get an ordered snapshot with strings... I don't see an easy way to automatically determine the "natural" result type of an expression, so in the middle term, MozRepl::RemoteObject::Methods::xpath needs to also take the result type as an (optional) parameter. Then, the Firefox `->xpath` API can be extended to allow specifying the kind of result.	[reply] [d/l] [select]
Re^3: Some issues with WWW::Mechanize::Firefox->xpath() method (ctrl+shift+k, stringValue) by dfaure (Chaplain) on Apr 02, 2013 at 13:57 UTC
The documentation talks about nodes, so it seems that there is no way to get an ordered snapshot with strings... If this could help, from previous running C++ code dealing directly with the XPCom layer, we found that, while using the ANY_TYPE: The resulting elements have always been returned (as expected) in the document order (aka a depth first tree walk). Even if not predictable at the query time, the exact result type, is driven by the expression query elements (btw, it would be nice to have it returned, to prevent recomputing it from query analysis). ____ HTH, Dominique My two favorites: If the only tool you have is a hammer, you will see every problem as a nail. --Abraham Maslow Bien faire, et le faire savoir...	[reply]
Re: Some issues with WWW::Mechanize::Firefox->xpath() method by Loops (Curate) on Apr 02, 2013 at 12:10 UTC
Even: `@ret = $mech->xpath('substring("hello",1,4)', { single => 1 } )` [download] Returns an empty array. It may be that you simply have to return the full text node to perl and then perform the substring in perl code.	[reply] [d/l]
Re: Some issues with WWW::Mechanize::Firefox->xpath() method (xpath 1.0) by Anonymous Monk on Apr 02, 2013 at 10:11 UTC
`test_xpath($mech, 'substring(//p,1,4)', all => 1); # expected String: Hell` That is not valid xpath syntax, what documentation are you reading?	[reply]
Re^2: Some issues with WWW::Mechanize::Firefox->xpath() method (xpath 1.0) by choroba (Cardinal) on Apr 02, 2013 at 10:22 UTC
Works for me (using XML::XSH2): `$ xsh $scratch/> insert element p into /scratch $scratch/> insert text 'Hello world' into /scratch/p $scratch/> echo substring(//p,1,4) Hell` [download] لսႽ� ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^3: Some issues with WWW::Mechanize::Firefox->xpath() method (xpath 1.0) by Anonymous Monk on Apr 02, 2013 at 10:42 UTC
Works for me (using XML::XSH2): And you're sure that's not an xsh2 function/feature? Produces no output for me #!/usr/bin/perl -- use strict; use warnings; use XML::LibXML; my $html = <<'__HTML__'; <html> <head> <title>Hello Firefox!</title> </head> <body> <h1>Hello <b>World</b>!</h1> <p id='paragraph'>Hello <b>WWW::Mechanize::Firefox</b> Goob bye</p> </body> </html> __HTML__ my $dom = XML::LibXML->new( qw/ recover 2 / )->load_html( #~ string => \$html, ## BUG! scalar... not like load_xml string => $html, ); local $\ = $/; print for $dom->findnodes(q{ substring(//p,1,4) }); ## nada print for $dom->findnodes(q{ //p }); ## paragraph [download]	[reply] [d/l]
Re^4: Some issues with WWW::Mechanize::Firefox->xpath() method (xpath 1.0) by choroba (Cardinal) on Apr 02, 2013 at 11:15 UTC
Re^5: Some issues with WWW::Mechanize::Firefox->xpath() method (xpath 1.0) by Corion (Patriarch) on Apr 02, 2013 at 11:30 UTC
Re^5: Some issues with WWW::Mechanize::Firefox->xpath() method (xpath 1.0) by Anonymous Monk on Apr 02, 2013 at 11:35 UTC
Re^2: Some issues with WWW::Mechanize::Firefox->xpath() method (xpath 1.0) by dfaure (Chaplain) on Apr 02, 2013 at 10:20 UTC
That is not valid xpath syntax, what documentation are you reading? All comes from the specs (http://www.w3.org/TR/xpath/#section-Expressions) which I need to respect. ____ HTH, Dominique My two favorites: If the only tool you have is a hammer, you will see every problem as a nail. --Abraham Maslow Bien faire, et le faire savoir...	[reply]
Re^3: Some issues with WWW::Mechanize::Firefox->xpath() method (xpath 1.0) by Anonymous Monk on Apr 02, 2013 at 11:00 UTC
All comes from the specs Hmm, I looked at http://www.w3.org/TR/xpath/#function-substring which only shows a literal .... But it doesn't match in LibXML, OTOH, LibXML accepts the xpath a valid Go figure :/	[reply]


Clear questions and runnable code get the best and fastest answer
	PerlMonks