HTML::TreeBuilder::XPath is too slow, contain memory leaks and buggy. So I recommend to use HTML::TreeBuilder::LibXML instead:
use strict;
use HTML::TreeBuilder::LibXML;
use Data::Dumper;
my $html = <<'HTML';
<html><body>
<table>
<tr><th>Firstseen (UTC)</th><th>Version</th><th>Feodo C&C</th><th>
+Status</th><th>SBL</th><th>ASN</th><th>Country</th><th>Lastseen (UTC)
+</th></tr>
#parsing the below
<tr bgcolor="#9d9595" onmouseover="this.style.backgroundColor='#FFA200
+';" onmouseout="this.style.backgroundColor='#9d9595';"><td>2016-03-19
+ 23:44:36</td><td bgcolor="#58D3F7" align="center"><strong>D</strong>
+</td><td><a href="/host/83.172.215.87/" target="_parent" title="Show
+more information about this Feodo C&C">83.172.215.87</a></td><td
+bgcolor="#4f883f">offline</td><td bgcolor="#bc5959"><a href="http://w
+ww.spamhaus.org/sbl/sbl.lasso?query=SBL290535" target="_blank" title=
+"Spamhaus SBL: SBL290535">SBL290535</a></td><td>AS12651 IPWORLDCOM</t
+d><td><img src="images/flags/ch.gif" alt="-" title="CH (CH)" width="1
+6" height="10" /> CH</td><td>never</td></tr>
<tr bgcolor="#837b7b" onmouseover="this.style.backgroundColor='#FFA200
+';" onmouseout="this.style.backgroundColor='#837b7b';"><td>2016-03-19
+ 23:44:36</td><td bgcolor="#58D3F7" align="center"><strong>D</strong>
+</td><td><a href="/host/98.23.159.86/" target="_parent" title="Show m
+ore information about this Feodo C&C">98.23.159.86</a></td><td bg
+color="#4f883f">offline</td><td bgcolor="#4f883f">Not listed</td><td>
+AS7029 WINDSTREAM</td><td><img src="images/flags/us.gif" alt="-" titl
+e="US (US)" width="16" height="10" /> US</td><td>never</td></tr>
<tr bgcolor="#9d9595" onmouseover="this.style.backgroundColor='#FFA200
+';" onmouseout="this.style.backgroundColor='#9d9595';"><td>2016-03-19
+ 23:44:36</td><td bgcolor="#58D3F7" align="center"><strong>D</strong>
+</td><td><a href="/host/178.188.14.86/" target="_parent" title="Show
+more information about this Feodo C&C">178.188.14.86</a></td><td
+bgcolor="#4f883f">offline</td><td bgcolor="#4f883f">Not listed</td><t
+d>AS8447 TELEKOM-AT</td><td><img src="images/flags/at.gif" alt="-" ti
+tle="AT (AT)" width="16" height="10" /> AT</td><td>2016-03-24 01:19:5
+0</td></tr>
</table>
</body></html>
HTML
my $tree = HTML::TreeBuilder::LibXML->new;
$tree->parse($html);
$tree->eof;
my @tr_nodes = $tree->findnodes('//tr[td]');
foreach my $tr_node (@tr_nodes) {
my @text = $tr_node->findvalues('td');
#my @text = $tr_node->findvalue('td'); #compare with this one! fin
+dvalue will contact all nodes for you
print Dumper( \@text );
#do something with @text...
}
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|