svenXY has asked for the wisdom of the Perl Monks concerning the following question:
Enlightened Ones and other Seekers of Widsom,
For my wife, who is a software translator, I am trying to achieve the following:
I have a glossary in HTML, implemented as a definition list. After translation, the glossary naturally needs to be re-sorted.
I already wrote a solution with Regular Expressions but with HTML being hard to parse, it is not very efficient so far... Thus I'd like to use HTML::TreeBuilder
It's quite easy when the glossary was a two column table (check my scratchpad if you are interested: svenXY's scratchpad), but with a definition list, the problem is that the <dt> and the <dd> tag are independent of each other. I can well sort the dt tag, but how do I at the same time sort the dd tag with it?
I have a solution here, but I don't really like it. I'm sure there are better ways to do it
My main problem is to properly dereference the tree and to replace the DL part of the tree with a sorted array of HTML::Element Objects without having to create and parse code first.
Any hints greatly appreciated,
svenXY
For my wife, who is a software translator, I am trying to achieve the following:
I have a glossary in HTML, implemented as a definition list. After translation, the glossary naturally needs to be re-sorted.
I already wrote a solution with Regular Expressions but with HTML being hard to parse, it is not very efficient so far... Thus I'd like to use HTML::TreeBuilder
It's quite easy when the glossary was a two column table (check my scratchpad if you are interested: svenXY's scratchpad), but with a definition list, the problem is that the <dt> and the <dd> tag are independent of each other. I can well sort the dt tag, but how do I at the same time sort the dd tag with it?
I have a solution here, but I don't really like it. I'm sure there are better ways to do it
#!/usr/bin/perl -w use strict; use HTML::TreeBuilder; use HTML::PrettyPrinter; use Data::Dumper; my $html_code = ' <html> <head> <title>Glossary</title> <h1>Glossary</h1> <dl> <dt><b>E Definition</b></dt> <dd>E - data</dd> <p></p> <dt><b>B Definition</b></dt> <dd>B - data</dd> <p></p> <dt><b>A_definition</b></dt> <dd>A data.</dd> <p></p> <dt><b>C definition</b></dt> <dd>C - data</dd> <p></p> </dl> </body> </html> '; my %glossar; my $tree = HTML::TreeBuilder->new; $tree->parse($html_code); my ($dl) = $tree->look_down('_tag', 'dl'); my %data; # looping trough the dt tags, # spawning a hash with the text of dt as key # and the HTML of dt and dd as values for my $dt ($dl->look_down("_tag", "dt")) { my $key = lc($dt->as_text); $data{$key}{'dt'} = $dt->as_HTML; my $dd = $dt->right; $data{$key}{'dd'} = $dd->as_HTML; } # create a string my $output; foreach (sort {lc($a) cmp lc($b)} keys %data) { $output .= $data{$_}{'dt'} . $data{$_}{'dd'} . "<p></p>"; } # feed the string to a new Parser Object my $new_dl = HTML::TreeBuilder->new; $new_dl->parse($output); my $nu_aber = (); # remove unneccesary tags $nu_aber = $new_dl->guts(); # replace old dl with new dl $dl->delete_content(); $dl->push_content($nu_aber); my $hpp = new HTML::PrettyPrinter ( 'linelength' => 130, 'quote_attr' => 1, 'allow_forced_nl' => 1, 'entities' => "&<>äöüßÄÖÜ"); $hpp->set_force_nl(1,qw(body head table tr td)); $hpp->nl_before(2,qw(tr td p)); my $linearray_ref = $hpp->format($tree); print @{$linearray_ref}; $tree = $tree->destroy;
Any hints greatly appreciated,
svenXY
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: HTML::TreeBuilder: sort a Definition List (<dl>)
by Tanktalus (Canon) on Sep 12, 2005 at 18:59 UTC | |
Re: HTML::TreeBuilder: sort a Definition List (<dl>)
by skillet-thief (Friar) on Sep 12, 2005 at 19:25 UTC | |
by Util (Priest) on Sep 13, 2005 at 01:54 UTC | |
by svenXY (Deacon) on Sep 13, 2005 at 09:03 UTC |
Back to
Seekers of Perl Wisdom