Parent/Child Group Relationship in XML Imports

onegative has asked for the wisdom of the Perl Monks concerning the following question:

G'day most honorable Monks,

I have a problem that is obviously above my head and I would really appreciate a push start getting it solved.

I generate the Original_Group.xml using SQL and Perl but because this is outside the applications capability to create its own export/import XML with a parent/child Group relationships so I am having to make this up as I go.

The Sorted_Group.xml is the manual manipulation that is required of the Perl parser I am trying to build.

The resulting Sorted_Group.xml requires that a Group record be Inserted only if it does not contain any Sub-Group members or that a defined Sub-Group member(s) already exists (inserted previously) before it will be allowed to be inserted.

Just not sure how to handle sorting with multiple levels of members in the xml record.

Any and all help is greatly appreciated.

Thanks,
Danny

Original_Group.xml

<SystemXchange version="1.0"> <summary> <tool version="2.8 build: [070207]">Enterprise Migration Tool< +/tool> <source></source> <timestamp>May 30, 2008 @ 10:00:00 CDT</timestamp> </summary>  <records type="group" operation="insert"> <record> <name>GOLF</name> <members> <Group>ECHO</Group> <Group>BETA</Group> <Group>DELTA</Group> <Group>FOXTROT</Group> </members> </record> <record> <name>ECHO</name> <members> <Group>CHARLIE</Group> <Group>DELTA</Group> </members> </record> <record> <name>CHARLIE</name> <members> <Group>ALPHA</Group> <Group>DELTA</Group> </members> </record> <record> <name>BETA</name> <members> <Group>FOXTROT</Group> <Group>CHARLIE</Group> </members> </record> <record> <name>FOXTROT</name> <members> <Group>ALPHA</Group> </members> </record> <record> <name>DELTA</name> <members> <Group>ALPHA</Group> <Group>FOXTROT</Group> </members> </record> <record> <name>ALPHA</name> <members> </members> </record> </records> </SystemXchange>
[download]

Sorted_Group.xml

<SystemXchange version="1.0"> <summary> <tool version="2.8 build: [070207]">Enterprise Migration Tool< +/tool> <source></source> <comments> </comments> <timestamp>May 30, 2008 @ 10:00:00 CDT</timestamp> </summary>  <records type="group" operation="insert"> <record> <name>ALPHA</name> <members> </members> </record> <record> <name>FOXTROT</name> <members> <Group>ALPHA</Group> </members> </record> <record> <name>DELTA</name> <members> <Group>ALPHA</Group> <Group>FOXTROT</Group> </members> </record> <record> <name>CHARLIE</name> <members> <Group>ALPHA</Group> <Group>DELTA</Group> </members> </record> <record> <name>BETA</name> <members> <Group>FOXTROT</Group> <Group>CHARLIE</Group> </members> </record> <record> <name>ECHO</name> <members> <Group>CHARLIE</Group> <Group>DELTA</Group> </members> </record> <record> <name>GOLF</name> <members> <Group>ECHO</Group> <Group>BETA</Group> <Group>DELTA</Group> <Group>FOXTROT</Group> </members> </record> </records> </SystemXchange>
[download]

Comment on Parent/Child Group Relationship in XML Imports
Select or Download Code

Replies are listed 'Best First'.

Re: Parent/Child Group Relationship in XML Imports
by Jenda (Abbot) on Jun 01, 2008 at 15:03 UTC

Not exactly the order you specified in the example, but a correct order anyway:

use strict;
use warnings;
no warnings 'uninitialized';

use XML::Rules;

my %written;
my %depends;
my $parser = XML::Rules->new(
    style => 'filter',
    rules => {
        _default => 'raw',
        'name,members' => 'raw extended',
        'Group' => 'raw extended array',
        record => sub {
            my @I_depend;
            if (exists $_[1]->{':members'} and exists $_[1]->{':member
+s'}{':Group'}) {
                @I_depend = grep !exists $written{$_}, map $_->{_conte
+nt}, @{$_[1]->{':members'}{':Group'}};
            }
            if (@I_depend) {
                $_[1]->{':I_depend'} = {map {$_ => 1} @I_depend};
                foreach (@I_depend) {
                    push @{$depends{$_}}, $_[1]
                }
                return;
            } else {
                my $name = $_[1]->{':name'}{_content};
                $written{$name} = 1;
                my @to_write = $_[1];
                if (exists $depends{$name}) {
                    push @to_write, find_dependent($name);
                }
                return 'record' => \@to_write;
            }
        }
    }
);

sub find_dependent {
    my $name = shift;
    my @to_write;
    foreach my $parent (@{$depends{$name}}) {
        delete $parent->{':I_depend'}{$name};
        if (! %{$parent->{':I_depend'}}) { # if it doesn't depend on a
+nything more
            push @to_write, $parent;
            push @to_write, find_dependent($parent->{':name'}{_content
+});
        }
    }
    delete $depends{$name};
    return @to_write;
}

$parser->filter(\*DATA);

if (%depends) {
    print STDERR "Unsatisfied dependencies! Some records depend on the
+se unknown groups: ",
        join ", ", keys %depends;
}
[download]

Only if all the records depend (directly or indirectly) on the last one, will the script keep at one moment all the data in memory. Otherwise whenever it finds a record with no dependencies or whose all dependencies were already printed, it will print that record and all records that only depended on this one (and those already printed).

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]

Re^2: Parent/Child Group Relationship in XML Imports

by onegative (Scribe) on Jun 03, 2008 at 19:46 UTC

<name>cicoutage</name>
<name>cic</name>
<name>actpage4_ecc</name>
<name>actpage4</name>
<name>actpage</name>
[download]

[reply]
[d/l]

Re^3: Parent/Child Group Relationship in XML Imports

by Jenda (Abbot) on Jun 04, 2008 at 00:02 UTC

The error message lists the missing groups, do they exist in the original XML file? Look for <name>such_id</name>. If they don't, your have an inconsistency in your file, if they do, I have a bug in the code. Both are possible.

If you change the last four lines of the script to

if (%depends) {
    print STDERR "\nUnsatisfied dependencies!\n";
    foreach my $missing (keys %depends) {
        print "Missing group '$missing' is a member of:\n";
        foreach my $failed (@{$depends{$missing}}) {
            print "  $failed->{':name'}{_content}\n";
        }
    }
}
[download]

I'll try to dissect the code a bit for you. I understand XML::Rules is a bit hard to get your head around. It fits my twisted brain perfectly, but I find it hard to explain. The basic idea behind the module was .. what if we converted XML into executable code? Something like

<root>
  <sub>value</sub>
  <some><x>45</x><y>77</y></some>
</root>
[download]

root(
  sub('value'),
  some( x(45), y(77))
)
[download]

Of course it's not as simple as that with attributes and content and with the functions sometimes needing to insert something into what's passed to the outer function as the content and sometimes add some attributes, but this is the basic concept. For each tag specify what to do with its data and add the result to the data of the parent tag.

There are some actions that are so simple and often needed that I added some builtin rules so that you don't have to write for example

  'name' => sub { return 'name' => $_[1]->{_content}},
or
  'name' => sub {
    my ($tag,$attr) = @_;
    return 'name' => $attr->{_content}
  },
[download]

  'name' => 'content',
[download]

Now this all works great if you want to extract data from the XML, restricting and tweaking the data as it's being parsed, but if you want to filter the XML and end up with something very similar you run into problems. Eg. in the example above the datastructure for the <name>'s parent tag will look exactly the same for these two snippets of XML:

  <parent x="1"><name>jenda</name></parent>
and
  <parent x="1" name="jenda"/>
[download]

Sounds awfull, but it means that if you set the rule of <name> to 'raw extended' and the XML looks like this

<parent x="1">blah <name>jenda</name> blah</parent>
[download]

{ x => "1", ":name" => {_content => "jenda"},
  _content => [ "blah ", ["name" => {_content => "jenda"}], " blah"] 
}
[download]

$attr->{':name'){_content}

So, the first three rules make sure the data passed to the <record> handler contain all the info so that we can print that tag with content exactly how it was and at the same time that the data I need are easy to access. The name of the group is in $attr->{':name'}{_content} and the member groups are in @{$attr->{':members'}{':Group'}}.

The first thing the handler does is that it finds out whether the currently processed group depends on any group we haven't written into the output file yet. It checks whether it depends on anything and if it does, it takes the array of hashrefs referenced by $attr->{':members'}{':Group'}, from each hashref takes just the value of the {_content} key and removes those that were already written.

If there are any such yet-unsatisfied dependencies, it records them in the tag's datastructure under the key {':I_depend'} (converting the list of names to a hash whose keys are the items of that list) and adds a reference to this tag's data to the %depends hash under each name of group it depends on. And returns nothing. Because we do not want to print anything at the moment.

If we find out that there are no unsatisfied dependencies we note the record name in the %written hash, find the other records that depended on the current one (or on some record that depended on the current record or ...) and then we return the current record and all the records we found so that XML::Rules can write them to the output.

Finding the dependent record is a recursive function, we take the name of the record we are just about to write and process all records that depend on it (from $depends{$that_name}), for each remove the name from its {':I_depend'} and if it was the last record that one depended on, we add it to the list to write and process its dependencies.

I hope I did not make things more confused by this terribly long post :-) It might help to add a few print Dumper(something) throughout the code to understand what data are where at what point.

Update: FOUND IT! There indeed was a problem in my code. One missing line. Please add $written{$parent->{':name'}{_content}} = 1; after the push @to_write, $parent; line. Silly mistake. I did not mark "written" those groups that were written after the group they depended on, only the groups that were written immediately, without waiting for another group to be parsed, were marked. Anyway there still is one unsatisfied dependency in your XML. The WTSWestTier1 depends on ThinClientOncall_Group and that one is nowhere to be found.

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]
[select]