Hi perlmoks
I have a TMX file which looks like this one
<?xml version="1.0" encoding="UTF-8"?>
<tmx version="1.4"><header creationtool="xx" creationtoolversion="1" s
+egtype="sentence" o-tmf="undefined" adminlang="en" srclang="en" datat
+ype="undefined"></header><body>
<tu changedate="20180321T113135Z" creationdate="20180321T113135Z" chan
+geid="user" tuid="1">
<prop type="client"> </prop>
<prop type="project"> </prop>
<prop type="domain"> </prop>
<prop type="subject"> </prop>
<prop type="corrected">no</prop>
<prop type="aligned">no</prop>
<tuv xml:lang="en"><seg>Hello
<b>world!</b></seg></tuv>
<tuv xml:lang="fr"><seg>Bonjour
<b> monde</b></seg></tuv>
</tu>
<tu changedate="20180321T113135Z" creationdate="20180321T113135Z" chan
+geid="user2" tuid="2">
<prop type="client"> </prop>
<prop type="project">yes</prop>
<prop type="corrected">no</prop>
<prop type="aligned">no</prop>
<tuv xml:lang="en"><seg>Hello
<b>world!</b></seg></tuv>
<tuv xml:lang="fr"><seg>Bonjour
<b> monde</b></seg></tuv>
</tu>
</body>
</tmx>
and I would like to export all the information in one line (tab separated).
I have the following code to export en and fr segments but it is not possible to export all other attributes.
use XML::LibXML;
my $dom = 'XML::LibXML'->load_xml(IO => *STDIN);
for my $child (
@{ $dom->find('/tmx/body/tu/tuv[@xml:lang=\'en\']/seg | /tmx/body/
+tu/tuv[@xml:lang=\'fr\']/seg | tmx/body/tu/prop | /tmx/body/tu/@creat
+iondate') }
) {
( my $contents = join '', $child->childNodes ) =~ s,\n, <lb/> ,g;
print $contents, $child->nodeName eq 'source' ? "\t" : "\n";
}
The ideal scenario would be to whatever props there are in the nodes and align them.
Could you please help me improve the code and sort it out?
Thanks
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.