ultibuzz has asked for the wisdom of the Perl Monks concerning the following question:
Situation: HUGE xml files with lots of informations that are not needed, we talk about millions of rows and lots of GB, and i don't understand , get confused with the dumper output
Example XML file (test.xml)
<BL_DPLI_ENTRY>
<BL_USER>
<USERID>22069332</USERID>
<PHONENO>
<ENTRY>877465535</ENTRY>
</PHONENO>
<PHONENO>
<ENTRY>86719273704</ENTRY>
</PHONENO>
<PHONENO>
<ENTRY>8671881760</ENTRY>
</PHONENO>
<PHONENO>
<ENTRY>8671969876</ENTRY>
</PHONENO>
<ADDRESS>
<LASTNAME>Müller</LASTNAME>
<FIRSTNAME>heiner</FIRSTNAME>
<MAIDENNAME/>
<STREET>Entenhausenerweg</STREET>
<STREET_NO>15</STREET_NO>
<ZIP>660666</ZIP>
<CITY>Entenhausen</CITY>
<COUNTRY>D</COUNTRY>
</ADDRESS>
</BL_USER>
<BL_DPLI_RECORD>
<DATE_CREATED>2006-01-31</DATE_CREATED>
<DATE_LAST_UPDATE>2006-01-30</DATE_LAST_UPDATE>
<REASON>
<ENTRY>highspender limit exceeded</ENTRY>
<DATE_OCCURRED>2006-01-30</DATE_OCCURRED>
</REASON>
<BL_DP_TOTAL>
<NUMBER_DENIED_PAYMENTS_TOTAL>0</NUMBER_DENIED_PAYMENTS_TOTAL>
<DATE_OLDEST_INVOICE/>
<MONETARYVALUE>
<V>0.00</V>
<C>EUR</C>
</MONETARYVALUE>
</BL_DP_TOTAL>
<BL_LI_RECORD>
<DATE_LAST_UPDATE>2006-01-30</DATE_LAST_UPDATE>
<MONETARYVALUE>
<V>215.84</V>
<C>EUR</C>
</MONETARYVALUE>
</BL_LI_RECORD>
</BL_DPLI_RECORD>
</BL_DPLI_ENTRY>
Dumper output
$VAR1 = {
'BL_DPLI_RECORD' => {
'REASON' => {
'DATE_OCCURRED' => '2006
+-01-30',
'ENTRY' => 'highspender
+limit exceeded'
},
'BL_LI_RECORD' => {
'MONETARYVALUE' =>
+ {
+ 'C' => 'EUR',
+ 'V' => '215.84'
+ },
'DATE_LAST_UPDATE'
+ => '2006-01-30'
},
'BL_DP_TOTAL' => {
'DATE_OLDEST_INVOIC
+E' => {},
'MONETARYVALUE' =>
+{
+ 'C' => 'EUR',
+ 'V' => '0.00'
+},
'NUMBER_DENIED_PAYM
+ENTS_TOTAL' => '0'
},
'DATE_LAST_UPDATE' => '2006-01-30',
'DATE_CREATED' => '2006-01-31'
},
'BL_USER' => {
'USERID' => '22069332',
'ADDRESS' => {
'COUNTRY' => 'D',
'MAIDENNAME' => {},
'CITY' => 'Entenhausen',
'ZIP' => '660666',
'LASTNAME' => 'M�ller',
'FIRSTNAME' => 'heiner',
'STREET' => 'Entenhausenerweg'
+,
'STREET_NO' => '15'
},
'PHONENO' => [
{
'ENTRY' => '877465535'
},
{
'ENTRY' => '86719273704'
},
{
'ENTRY' => '8671881760'
},
{
'ENTRY' => '8671969876'
}
]
}
};
I need all Phone numbers with the reason behind (entry) Example __OUT__
877465535;highspender limit exceeded
86719273704;highspender limit exceeded
8671881760;highspender limit exceeded
8671969876;highspender limit exceeded
i attemped so far xml::simple with data::dumper, but i get confused howto access a record for each Customer because they always start with BL_DPLI_RECORD tips hints are definatly needed
kd ultibuzz
#!"C:\perl\bin\perl.exe"
use warnings;
use strict;
use XML::Simple;
use Data::Dumper;
my $config = XMLin('test.xml');
open(OUT,'>','out.txt') or die;
print OUT Dumper($config);
Edited by planetscape - added readmore tags
Re: Confusion ,XML::SIMPLE with DATA:DUMPER
by Joost (Canon) on Nov 20, 2006 at 16:08 UTC
|
Forget about using XML::SImple for large XML files. It will take quite a lot more memory than the file size to load XML via XML::Simple.
I'd recommend XML::Twig (and keep the fhe flush() method in mind).
update: XML::Twig also has much better search methods for finding the tags you're interested in (and ignoring the rest). XML::Simple is more useful if the XML is fairly small and already more or less matches your desired data structure.
| [reply] |
Re: Confusion ,XML::SIMPLE with DATA:DUMPER
by monkey_boy (Priest) on Nov 20, 2006 at 17:00 UTC
|
A little snipet to get you going, you could easyly post-process the output if this:
use strict;
use warnings;
use XML::Twig;
my $t = XML::Twig->new(
twig_handlers => {
'PHONENO/ENTRY' => \&print_n_purge,
'REASON/ENTRY' => \&print_n_purge,
}
);
$t->parsefile($your_xml_file);
sub print_n_purge
{
my( $t, $elt)= @_;
print $elt->parent->name,":",$elt->text , "\n";
$t->purge;
};
This is not a Signature...
| [reply] [d/l] |
|
xml::twig sounds good, if vpn is working right i will test now some. i just came home from a 12 hour working day big thanks to joos and you for that hind with xml:twig
UPDATEi have tryed this snipet and this is the output
PHONENO:877465535
PHONENO:86719273704
PHONENO:8671881760
PHONENO:8671969876
not well-formed (invalid token) at line 17, column 14, byte 313 at C:/
+Perl/site/lib/XML/Parser.pm line 187
i added 'REASON/DATE_OCCURRED' => \&print_n_purge because i thought its related to that i don't take this element of REASON. but this didn't help UPDATE2 well now i don't understand this, i jsut copy the informations ina a file called test2.xml and start over and i got this
PHONENO:867112593
PHONENO:86719273704
PHONENO:8671881760
PHONENO:8671969876
REASON:highspender limit exceeded
no element found at line 52, column 16, byte 1252 at C:/Perl/site/lib/
+XML/Parser.pm line 187
as you can see now the Reason is ther still an error but the reason is printed
| [reply] [d/l] [select] |
|
<?xml version="1.0" encoding="ISO-8859-1"?>
Your encoding is probably iso-8859-1 (latin-1), CP1252 (latin windows encoding) or utf-8 (one of the unicode encodings)
| [reply] [d/l] |
|
|
Re: Confusion ,XML::SIMPLE with DATA:DUMPER
by madbombX (Hermit) on Nov 20, 2006 at 17:05 UTC
|
I agree with Joost above, so I will therefore not restate his suggestion on using XML::Twig. However, if you are bent (for one reason or another) on using XML::Simple, then you need to look into the "ForceArray => 1" config item:
my $config = XMLin('test.xml', ForceArray => 1);
| [reply] [d/l] |
|
| [reply] |
Re: Confusion ,XML::SIMPLE with DATA:DUMPER
by inman (Curate) on Nov 20, 2006 at 20:15 UTC
|
You can use a combination of XML::Twig to do the file processing record by record and XML::Simple for convenient data structure. Forcearray is strongly advised for XML::Simple. | [reply] |
Re: Confusion ,XML::SIMPLE with DATA:DUMPER
by Smaug (Pilgrim) on Nov 20, 2006 at 17:52 UTC
|
Hi,
Untested but you're welcome to try:
$reason = $config->{'BL_DPLI_RECORD'}->{'REASON'}->{'ENTRY'};
That should give you a good starting point to move on from.
Update: In fact if you read getting required format output from a xml file I believe it will solve your confusion. | [reply] [d/l] |
|
| [reply] |
Re: Confusion ,XML::SIMPLE with DATA:DUMPER
by ultibuzz (Monk) on Nov 20, 2006 at 21:39 UTC
|
so i used this snipet to fill an array, then i seek in this array untill text is found, then all elements passed so far get the text found behind well even if this it not the nice way it works i don't find another way reading the xml::twig documentation tomorrow i will have again a look, i think its over for today with the brainpower
kd ultibuzz
| [reply] |
|
Considering the data you need, there is no way for you to avoid storing the phone numbers, then outputing them when you find the reason. You could leave the elements (and just these elements) in the tree though, by using twig_roots to get the twig built only for them:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
XML::Twig->new( # the twig will only contain PHONENO/ENTRY and REASON/
+ENTRY elements
# (plus the root or it would not be a tree)
twig_roots => { 'BL_USER/PHONENO/ENTRY' => 1,
+
'BL_DPLI_RECORD/REASON/ENTRY' => \&rea
+son,
}
)
->parsefile( 'phone_data.xml');
sub reason
{ my( $t, $reason)= @_;
foreach my $phone_no ($reason->prev_siblings)
{ print $phone_no->text, ";", $reason->text, "\n"; }
$t->purge;
}
| [reply] [d/l] |
|
| [reply] |
|
|
RE: Confusion ,XML::SIMPLE with DATA:DUMPER
by ultibuzz (Monk) on Nov 24, 2006 at 08:11 UTC
|
i want a third value to be printed out, so i set it up in the root
with next_siblings i want to access the value but didn't work when i set the root tu => 1 its printet out like an element of phone number
so whats wrong in my mind, whats screwed up ?
kd ultibuzz
| [reply] [d/l] |
|
|