Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Strip XML document

by choroba (Archbishop)
on Jul 10, 2018 at 16:32 UTC ( #1218250=note: print w/replies, xml ) Need Help??


in reply to Strip XML document

It's not clear what output you expect.

Something like this?

for my $tu ($dom->findnodes('/tmx/body/tu')) { for my $child ($tu->findnodes('*')) { ( my $text = $child->textContent ) =~ s,\n, <lb/> ,g; print $text, "\t"; } print "\n"; } __END__ no no Hello <lb/> world! Bonjour <lb/> m +onde yes no no Hello <lb/> world! Bonjour <lb/> monde

Or do you want a table of all the prop types?

use feature qw{ say }; use List::Util qw{ uniq }; my @headers = sort +uniq(map $_->value, $dom->findnodes('/tmx/body/tu/prop/@type' +)); for my $tu ($dom->findnodes('/tmx/body/tu')) { my %props; for my $prop ($tu->findnodes('prop')) { $props{ $prop->findvalue('@type') } = $prop->textContent; } print join("\t", map $_ // "", @props{@headers}), "\t"; for my $child ($tu->findnodes('tuv')) { ( my $text = $child->textContent ) =~ s,\n, <lb/> ,g; print $text, "\t"; } print "\n"; } __END__ aligned client corrected domain project subject no no Hello <lb/> world! Bonjour <lb/> monde + no no yes Hello <lb/> world! Bonjour <lb/> mo +nde
</c>

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Replies are listed 'Best First'.
Re^2: Strip XML document
by corfuitl (Sexton) on Jul 11, 2018 at 13:33 UTC

    Thank you so much! Both solutions work but I prefer the second one.

Re^2: Strip XML document
by corfuitl (Sexton) on Dec 18, 2018 at 12:19 UTC

    Hi

    I am back again regarding this because I realised that some of my TUVs contain properties. So, this is an example:

    <tu changedate="20180321T113135Z" creationdate="20180321T113135Z" chan +geid="user2" tuid="2"> <prop type="client"> </prop> <prop type="project">yes</prop> <prop type="corrected">no</prop> <prop type="aligned">no</prop> <tuv xml:lang="en"><seg>Hello <b>world!</b></seg></tuv> <tuv xml:lang="fr"> <prop type="client"> </prop> <prop type="project">yes</prop> <prop type="corrected">no</prop> <prop type="aligned">no</prop> <seg>Bonjour <b> monde</b></seg></tuv> </tu>

    How can I get these properties and distinguish them from the others? For instance, the column may have the name TU:client and TUV:client or so.

    Thanks

      Can you have a TUV:property for more than one language ?. For example TUV:client:en and TUV:client:fr

      poj

        Oh, yes! It should be for each language because both languages may have prop

        Fantastic!!! It works. Thank you so much!

        Thanks for your efforts

        This works fine, however, how is it possible to export the formatting tags for the <seg>? I mean, the and other similar tags may appear.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1218250]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2021-04-14 01:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?