Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^2: table within table

by johncute (Initiate)
on Feb 11, 2010 at 09:35 UTC ( [id://822616]=note: print w/replies, xml ) Need Help??


in reply to Re: table within table
in thread table within table

here is my code:

$ctr=0; while(/(<table>[^\000]*?<\/table>)/){ $text=$1; while($text=~/<table>/){ $tag=$&; $ctr=$ctr+1; $tag=~s/(<table)/\1$ctr/; $text=~s/<table>/$tag/; } $text=~s/(<table)$ctr>/\1_level$ctr>/g; $text=~s/(<\/table)>/\1_level$ctr>/g; $ctr=0; $text=~s/(<table)[0-9]+>/\1>/g; $text=~s/(<\/?)(thead|tbody)([^>]*)?>//g; $text=~s/(<\/?)(th)([^>]*)?>/$1td>/g; while($text =~ /<a href="([^"]*)">[^\000]*?<\/a>/){ $href = $1; $class = ""; if($href =~ /^http/i){ $class = "http";} if($href =~ /^www/i){ $class = "nohttp";} if($href =~ /^mailto/i){$class = "mailto";} if($href =~ /^ftp/i){ $class = "ftp";} if($class eq ""){ $text =~ s/<a href="([^"]*)">([^\000]*?)<\/a>/\2/; }else{ $text =~ s/<a href="([^"]*)">([^\000]*?)<\/a>/<remotelink href +class="$class" href="\1" >\2<\/remotelink>/; } } s/<table>[^\000]*?<\/table>/$text/; } # Remove table and img tags inside table if an <img /> tag was encou +ntered while (/<table_level(2|3)>[^\000]*?<\/table_level\1>/) { $table2=$&; if ($table2 =~ /<img /) { # Remove all table tags including <img /> tag $table2=~s/<\/?(table_level(2|3)|tr|td)(\s+[^>]*)?>|<img\s+[^>]*\/ +>//g; s/<table_level(2|3)>[^\000]*?<\/table_level\1>/$table2/; } else { $table2=~s/(<\/?table_level\d)/$1_temp/g; s/<table_level(2|3)>[^\000]*?<\/table_level\1>/$table2/; } } s/(<\/?table_level\d)_temp/$1/g; # Extract table inside table if no <img /> tag was encountered # inside the inner table. while (/(<table_level1>[^\000]*?<\/table_level1>)/) { $table1=$1; #$table2=""; $table=""; while ($table1=~ /<table_level2>([^\000]*?)<\/table_level2>/) { $table2=$1; $table=$&; # Extract inner table and place it after the second level table + $extracted_table3=""; while ($table2 =~ s/(<table_level3>[^\000]*?<\/table_level3>)//) + { $extracted_table3="$extracted_table3\n$1"; } $table2=~s/<table_level2>([^\000]*?)<\/table_level2>/$table$extr +acted_table3/g; #$table2=~s/(<table_level2>[^\000]*?<\/table_level2>)/$1$extract +ed_table3/g; s/(<table_level2>[^\000]*?<\/table_level2>)//; #$table2=~s/(<\/?table)_level2/$1_2/g; } $table1=~s/<table_level2>([^\000]*?)<\/table_level2>//; $table1=~s/<table_2>([^\000]*?)<\/table_2>//; $table1=~s/(<\/?table)_level1/$1/g; s/<table_level1>[^\000]*?<\/table_level1>/$table2$table1/; } s/(<\/?table)_(level\d|\d)/$1/g;

And here is my sample data

<table> <tr> <td> <table> <thead> <tr> <th>Vill</th> <th>Hi</th> <th>Au</th> </tr> </thead> <tbody> <tr> <td>Aix</td> <td>40</td> <td>27</td> </tr> <tr> <td>Freib</td> <td>30</td> <td></td> </tr> <tr> <td>Gdan</td> <td>20</td> <td>13</td> </tr> <tr> <td>Gd</td> <td>44</td> <td>14</td> </tr> <tr> <td>Gren</td> <td>33</td> <td>22</td> </tr> <tr> <td>Karl</td> <td>26</td> <td></td> </tr> <tr> <td>La</td> <td>31</td> <td>18</td> </tr> <tr> <td></td> <td>30</td> <td>20</td> </tr> <tr> <td>Lyon</td> <td>41</td> <td>19</td> </tr> <tr> <td>Man</td> <td>22</td> <td></td> </tr> <tr> <td>Mar</td> <td>32</td> <td>18</td> </tr> <tr> <td>Mar</td> <td>17</td> <td>13</td> </tr> <tr> <td>Mon</td> <td>36</td> <td>26</td> </tr> <tr> <td>Mul</td> <td>30</td> <td>45</td> </tr> <tr> <td>Mun</td> <td>28</td> <td>23</td> </tr> <tr> <td>Nice</td> <td>41</td> <td>17</td> </tr> <tr> <td>Nims</td> <td>34</td> <td>25</td> </tr> <tr> <td>Nio</td> <td>29</td> <td>21</td> </tr> <tr> <td>Orleans</td> <td>32</td> <td>17</td> </tr> <tr> <td>Pad</td> <td>36</td> <td>20</td> </tr> <tr> <td>Paris</td> <td>24</td> <td>29</td> </tr> <tr> <td>Perk</td> <td>38</td> <td>29</td> </tr> <tr> <td>Poit</td> <td>27</td> <td>24</td> </tr> <tr> <td>Prag</td> <td>26</td> <td>16</td> </tr> <tr> <td></td> <td>23</td> <td>14</td> </tr> <tr> <td>Ren</td> <td>30</td> <td>18</td> </tr> <tr> <td>Rot</td> <td>36</td> <td>27</td> </tr> <tr> <td>Rou</td> <td>45</td> <td>22</td> </tr> <tr> <td>Saint</td> <td>33</td> <td>20</td> </tr> <tr> <td>Salon</td> <td>33</td> <td>18</td> </tr> <tr> <td>Sev</td> <td>63</td> <td>29</td> </tr> <tr> <td>Sop</td> <td>19</td> <td>8</td> </tr> <tr> <td>Stra</td> <td>28</td> <td>26</td> </tr> <tr> <td>Stut</td> <td>26</td> <td></td> </tr> <tr> <td>logne</td> <td>22</td> <td>11</td> </tr> <tr> <td>lon</td> <td>31</td> <td>22</td> </tr> <tr> <td>use</td> <td>28</td> <td>17</td> </tr> <tr> <td>Ts</td> <td>29</td> <td>22</td> </tr> <tr> <td>Val</td> <td>36</td> <td>23</td> </tr> <tr> <td>Zur</td> <td>29</td> <td>22</td> </tr> </tbody> </table> </td> <td> <table> <tr> <td><span><strong>Legend</strong></span></td> </tr> <tr> <td> <table> <thead> <tr> <th>head1</th> <th>head2</th> </tr> </thead> <tbody> <tr> <td>bon</td> <td></td> <td>0 / 25</td> </tr> <tr> <td>Ton</td> <td></td> <td>25 / 50</td> </tr> <tr> <td>Don</td> <td></td> <td>50 / 75</td> </tr> <tr> <td>Con</td> <td></td> <td>75 / 100</td> </tr> <tr> <td>Trs</td> <td></td> <td> 100</td> </tr> </tbody> </table> </td> </tr> </table> </td> </tr> <tr> <td colspan="2"></td> </tr> <tr> <td colspan="2">This is a sample content</td> </tr> <tr> <td colspan="2"></td> </tr> <tr> <td colspan="2">Site : <a href="http://www.yahoo.com" target=" +_blank">www.yahoo.com</a></td> </tr> </table>

The output should be like, if a table is consists of 3 levels. the level 2 should be at the top of level 1 then level 3 should be at the bottom of level 2.

The output will be:

level2

level3

level1

If there will be a table greater than the 3rd level or the deepest level, it will be outputted at the bottom of the 3rd level.

Example

level2

level3

level4

level5

level6

level7

level8

level1

Hope I explained it well.

Replies are listed 'Best First'.
Re^3: table within table
by Utilitarian (Vicar) on Feb 11, 2010 at 10:38 UTC
    Seriously, this is better achieved using one of the HTML::Parser modules. For example take a look at HTML::TokeParser
    use strict; use warnings; use HTML::TokeParser; my $p = HTML::TokeParser->new("file.html") # your source file ||die "Cant open: $!"; my $depth=0; while (my $token = $p->get_token) { if (lc(${$token}[1]) eq "table"){ $depth++ if (${$token}[0] eq "S"); $depth-- if (${$token}[0] eq "E"); print "$depth\n"; } }
    Try out the code above and see where it takes you

    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://822616]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-19 04:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found