Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Problem getting fields out of an XPath node list

by Corion (Patriarch)
on Mar 28, 2016 at 19:07 UTC ( [id://1158975]=note: print w/replies, xml ) Need Help??


in reply to Problem getting fields out of an XPath node list

You're using ->findvalue('td');. I recommend using a second ->findnodes() again, and then using ->as_text:

use strict; use HTML::TreeBuilder::XPath; my $html = <<'HTML'; <html><body> <table> <tr><th>Firstseen (UTC)</th><th>Version</th><th>Feodo C&amp;C</th><th> +Status</th><th>SBL</th><th>ASN</th><th>Country</th><th>Lastseen (UTC) +</th></tr> #parsing the below <tr bgcolor="#9d9595" onmouseover="this.style.backgroundColor='#FFA200 +';" onmouseout="this.style.backgroundColor='#9d9595';"><td>2016-03-19 + 23:44:36</td><td bgcolor="#58D3F7" align="center"><strong>D</strong> +</td><td><a href="/host/83.172.215.87/" target="_parent" title="Show +more information about this Feodo C&amp;C">83.172.215.87</a></td><td +bgcolor="#4f883f">offline</td><td bgcolor="#bc5959"><a href="http://w +ww.spamhaus.org/sbl/sbl.lasso?query=SBL290535" target="_blank" title= +"Spamhaus SBL: SBL290535">SBL290535</a></td><td>AS12651 IPWORLDCOM</t +d><td><img src="images/flags/ch.gif" alt="-" title="CH (CH)" width="1 +6" height="10" /> CH</td><td>never</td></tr> <tr bgcolor="#837b7b" onmouseover="this.style.backgroundColor='#FFA200 +';" onmouseout="this.style.backgroundColor='#837b7b';"><td>2016-03-19 + 23:44:36</td><td bgcolor="#58D3F7" align="center"><strong>D</strong> +</td><td><a href="/host/98.23.159.86/" target="_parent" title="Show m +ore information about this Feodo C&amp;C">98.23.159.86</a></td><td bg +color="#4f883f">offline</td><td bgcolor="#4f883f">Not listed</td><td> +AS7029 WINDSTREAM</td><td><img src="images/flags/us.gif" alt="-" titl +e="US (US)" width="16" height="10" /> US</td><td>never</td></tr> <tr bgcolor="#9d9595" onmouseover="this.style.backgroundColor='#FFA200 +';" onmouseout="this.style.backgroundColor='#9d9595';"><td>2016-03-19 + 23:44:36</td><td bgcolor="#58D3F7" align="center"><strong>D</strong> +</td><td><a href="/host/178.188.14.86/" target="_parent" title="Show +more information about this Feodo C&amp;C">178.188.14.86</a></td><td +bgcolor="#4f883f">offline</td><td bgcolor="#4f883f">Not listed</td><t +d>AS8447 TELEKOM-AT</td><td><img src="images/flags/at.gif" alt="-" ti +tle="AT (AT)" width="16" height="10" /> AT</td><td>2016-03-24 01:19:5 +0</td></tr> </table> </body></html> HTML my $p = HTML::TreeBuilder->new; my $tree = $p->parse($html); my @nodes = $tree->findnodes('//tr'); use Data::Dumper; for my $node (@nodes) { my @text = $node->findnodes('td') or next; for (@text) { print $_->as_text, "\n"; }; }

Maybe you want to be more specific with your XPath expressions to extract the cells directly. For example /tr/td[1] for first seen etc. . Also see HTML::TableExtract.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1158975]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-25 09:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found