Hi Monks , I am looking for some help as i got stuck while scrapping website for data present in table
Below is my html file data which i am fetching .I took it from developer option
<tbody><tr><th colspan="15" style="COLOR:RED;FONT-SIZE:12pt; FONT-WEIG
+HT:BOLD; TEXT-ALIGN:center;">Amount </th></tr>
<tr><th rowspan="2">Region</th><th colspan="2">Level 31.03.2016</th><t
+h colspan="3">Sanction/Renewal<br>01.04.2016 to 28.02.2017</th><th co
+lspan="2">Level 28.02.2017</th><th colspan="3">Sanction/Renewal<br>Du
+ring Current Month</th><th colspan="2">Level 26.03.2017</th><th colsp
+an="2">Growth<br>as on<br>26.03.2017</th></tr><tr><th>No.</th><th>Bal
+ance</th><th>No.</th><th>Limit</th><th>Balance</th><th>No.</th><th>Ba
+lance</th><th>No.</th><th>Limit</th><th>Balance</th><th>No.</th><th>B
+alance</th><th>GDM</th><th>GUM</th></tr><tr>
<td style="TEXT-ALIGN:LEFT;"><a href="AGL_CROPC0820201.HT
+M">TEMPORARY-01</a></td><td>19600</td><td>288.36</td><td>14306</td><t
+d>272.25</td><td>194.22</td><td>19246</td><td>284.53</td><td>989</td>
+<td>19.02</td><td>12.94</td><td>19450</td><td>290.33</td><td>5.80</td
+><td>1.97</td></tr><tr>
<td style="TEXT-ALIGN:LEFT;"><a href="AGL_CROPC0820202.HT
+M">TEMPORARY-02</a></td><td>17417</td><td>167.40</td><td>9466</td><td
+>123.61</td><td>99.40</td><td>16717</td><td>167.24</td><td>823</td><t
+d>11.71</td><td>9.11</td><td>16721</td><td>169.51</td><td>2.27</td><t
+d>2.11</td></tr><tr>
<td style="TEXT-ALIGN:LEFT;"><a href="AGL_CROPC0820203.HT
+M">TEMPORARY-03</a></td><td>13545</td><td>180.62</td><td>8395</td><td
+>144.63</td><td>110.32</td><td>12675</td><td>179.13</td><td>333</td><
+td>7.38</td><td>5.38</td><td>12630</td><td>180.13</td><td>1.00</td><t
+d>-0.49</td></tr><tr>
<td style="TEXT-ALIGN:LEFT;"><a href="AGL_CROPC0820204.HT
+M">TEMPORARY-04</a></td><td>21826</td><td>225.82</td><td>10249</td><t
+d>133.52</td><td>113.51</td><td>21558</td><td>230.69</td><td>624</td>
+<td>10.07</td><td>7.84</td><td>21524</td><td>233.99</td><td>3.30</td>
+<td>8.17</td></tr><tr>
<td style="TEXT-ALIGN:LEFT;"><a href="AGL_CROPC0820205.HT
+M">TEMPORARY-05</a></td><td>41299</td><td>736.24</td><td>34023</td><t
+d>732.70</td><td>601.55</td><td>40822</td><td>732.78</td><td>3177</td
+><td>76.32</td><td>60.46</td><td>40794</td><td>736.45</td><td>3.67</t
+d><td>0.21</td></tr><tr style="BACKGROUND-COLOR:YELLOW;FONT-WEIGHT:BO
+LD;">
<td style="TEXT-ALIGN:LEFT;">TEMPORARY-TOTAL</td><td>1136
+87</td><td>1598.44</td><td>76439</td><td>1406.71</td><td>1119.00</td>
+<td>111018</td><td>1594.37</td><td>5946</td><td>124.50</td><td>95.73<
+/td><td>111119</td><td>1610.41</td><td>16.04</td><td>11.97</td></tr><
+/tbody>
#!usr/bin/perl
####extracting table having table id ####
use Modern::Perl;
use WWW::Mechanize;
use HTML::TableExtract;
open(my $OUT, '>>', 'papa') or die "Could not open file $!";
my $mech = WWW::Mechanize->new();
$mech->get('http://xxx.com/tempo/TYPE_CAT/AGL_CROPC0820200.HTM');
my $html_string = $mech->content();
my $te = HTML::TableExtract->new();####extracting all table ####
$te->parse($html_string);
foreach my $ts ( $te->tables ) {
print "Table (", join( ',', $ts->coords ), "):\n";
foreach my $row ( $ts->rows ) {
$OUT-> print( join( ',', @$row ), "\n");
}
}
below is my output from abv fetched data in which top 9 lines are not properly formatted
Amount,,,,,,,,,,,,,,
Region,Level 31.03.2016,,Sanction/Renewal
01.04.2016 to 28.02.2017,,,Level 28.02.2017,,Sanction/Renewal
During Current Month
,,,Level 26.03.2017,,Growth
as on
26.03.2017,
,No.,Balance,No.,Limit,Balance,No.,Balance,No.,Limit,Balance,No.,Balan
+ce,GDM,GUM
TEMPORARY-01,19600,288.36,14306,272.25,194.22,19246,284.53,989,19.02,1
+2.94,19450,290.33,5.80,1.97
TEMPORARY-02,17417,167.40,9466,123.61,99.40,16717,167.24,823,11.71,9.1
+1,16721,169.51,2.27,2.11
TEMPORARY-03,13545,180.62,8395,144.63,110.32,12675,179.13,333,7.38,5.3
+8,12630,180.13,1.00,-0.49
TEMPORARY-04,21826,225.82,10249,133.52,113.51,21558,230.69,624,10.07,7
+.84,21524,233.99,3.30,8.17
TEMPORARY-05,41299,736.24,34023,732.70,601.55,40822,732.78,3177,76.32,
+60.46,40794,736.45,3.67,0.21
TEMPORARY-TOTAL,113687,1598.44,76439,1406.71,1119.00,111018,1594.37,59
+46,124.50,95.73,111119,1610.41,16.04,11.97
SO i dont know how to extract data with row span and column span.