Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: XML::Twig Stream root children

by Herkum (Parson)
on Oct 02, 2009 at 20:28 UTC ( [id://798940]=note: print w/replies, xml ) Need Help??


in reply to Re: XML::Twig Stream root children
in thread XML::Twig Stream root children

xml_split does not create valid xml documents. It is appears mainly to allow you break apart the document and then put it back together with xml_merge.

I guess I could modify the resulting document after running it. If I don't have any other solutions.

Replies are listed 'Best First'.
Re^3: XML::Twig Stream root children
by Anonymous Monk on Oct 03, 2009 at 01:47 UTC
    xml_split does not create valid xml documents.

    Sure it does. Observe

    $$ echo THIS IS THE SAME AS xml_split -v Herkum.xml $$ xml_split -v -c "level(1)" Herkum.xml generating main file Herkum-00.xml generating Herkum-01.xml generating Herkum-02.xml $$ cat Herkum-00.xml && echo <root> <?merge subdocs = 0 :Herkum-01.xml?> <?merge subdocs = 0 :Herkum-02.xml?> </root> $$ cat Herkum-01.xml && echo <a> <b>Test</b> </a> $$ cat Herkum-02.xml && echo <aa> <b>Test</b> </aa>
    You can use "level(1)" with twig_handlers
      You can even do xml_split -l 1

        using the 'level(1)' as the Anonymous Monk suggested does the trick. The question I have now, is where is this documented? I only saw one it mentioned once in the XML::Twig POD and I found an example in the unit tests.

        Update: I take this back, I did find in the documentation. Though I had never noticed it before. This project has been a good learning experience.

Re^3: XML::Twig Stream root children
by mirod (Canon) on Oct 04, 2009 at 09:51 UTC

    Would you care to explain? I tried to make it output well-formed fragments. About the only thing I can think of that would trip xml_split would be DTDs with entities, and if you have an example, I would be glad to make it work for you.

    Thanks

      Unless I'm overlooking something, a new feature could be to have an option to omit the merge file (00) and instead have the root wrapped around each output file, ie foo.xml
      <root> <a> <b>Test</b> </a> <aa> <b>Test</b> </aa> </root>
      becomes 2 files, foo-01.xml
      <root> <a> <b>Test</b> </a> </root>
      and foo-02.xml
      <root> <aa> <b>Test</b> </aa> </root>

        I may be overlooking something here. The generated "split files" would consist of the 'a' or 'aa', or whatever other level 1 element. So correct me if I'm wrong, but the only thing missing would be the '<root>' first line and '</root>' last line, right? Adding those to each of the files is left as an exercise to the reader.

        I want to break one huge XML document into multiple stand-alone documents. The application that would use this is not smart enough to to handle 'subdocs'.

        I was also looking at a streaming solution, which is what I assumed twig_handlers would do. The reason is that the document I want to break is 500MB(don't ask, I did not do it!) and there is no way it will fit into memory.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://798940]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (None)
    As of 2024-04-25 01:43 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found