http://qs321.pair.com?node_id=1208325

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi perlmonks, i am writing a script that goes through an xml file reporting about the names and prices. I am having problems aligning the data. This is the xml file:

<breakfast_menu> <food> <name>Belgian Waffles</name> <price>$5.95</price> <description>Two of our famous Belgian Waffles with plenty of r +eal maple syrup</description> <calories>650</calories> </food> <food> <name>Strawberry Belgian Waffles</name> <price>$7.95</price> <description>Light Belgian waffles covered with strawberries an +d whipped cream</description> <calories>900</calories> </food> <food> <name>Berry-Berry Belgian Waffles</name> <price>$8.95</price> <description>Light Belgian waffles covered with an assortment o +f fresh berries and whipped cream</description> <calories>900</calories> </food> <food> <name>French Toast</name> <price>$4.50</price> <description>Thick slices made from our homemade sourdough brea +d</description> <calories>600</calories> </food> <food> <name>Homestyle Breakfast</name> <price>$6.95</price> <description>Two eggs, bacon or sausage, toast, and our ever-po +pular hash browns</description> <calories>950</calories> </food> </breakfast_menu>

And this is the script:

open($fh,"<","xml.xml") or die $!; while(<$fh>){ if($_ =~ /<price>\$(.*)<\/price>/){ push @arr, $1; }elsif($_ =~ /<name>(.*)<\/name>/){ push @ar2,$1; } } @index = sort{$ar2[$a] cmp $ar2[$b]} 0..$#ar2; @har = @ar2[@index]; @har2 = @arr[@index]; $teller = 0; foreach(@har){ print "$_ "; print $har2[$teller]; print "\n"; $teller++; }

I have tried aligning the output-data with the use of spaces and tabs, but i'm not succeeding. Tx for the help.

Replies are listed 'Best First'.
Re: aligning text
by Tux (Canon) on Feb 02, 2018 at 15:14 UTC
    $ cpan XML::Twig; $ xml_pp <file.xml

    Enjoy, Have FUN! H.Merijn
Re: aligning text
by haukex (Archbishop) on Feb 02, 2018 at 17:22 UTC

    Please don't use regexes to parse XML.

    #!/usr/bin/env perl use warnings; use strict; use XML::LibXML; my $doc = XML::LibXML->load_xml( location => 'in.xml' ); my (@data, %maxlen); for my $food ( $doc->findnodes('/breakfast_menu/food') ) { my %elem; for my $t ( qw/ name price / ) { # assume only one of these elements each per <food> my ($str) = $food->getElementsByTagName($t); $str = $str->textContent; $str =~ s/^\s*\$// if $t eq 'price'; if ( !defined $maxlen{$t} || length $str > $maxlen{$t} ) { $maxlen{$t} = length $str } $elem{$t} = $str; } push @data, \%elem; } @data = sort { $a->{name} cmp $b->{name} } @data; for my $elem (@data) { printf "%-$maxlen{name}s %$maxlen{price}.2f\n", $elem->{name}, $elem->{price}; } __END__ Belgian Waffles 5.95 Berry-Berry Belgian Waffles 8.95 French Toast 4.50 Homestyle Breakfast 6.95 Strawberry Belgian Waffles 7.95
Re: aligning text
by roboticus (Chancellor) on Feb 02, 2018 at 17:07 UTC

    Anonymous Monk:

    No-one else seems to have mentioned the perils of parsing XML with regular expressions, so I guess I'll do so. It's all fine so long as the XML continues to come in to you formatted as your example, or if you control both ends of the data feed.

    However, when dealing with third-party data feeds, at some point, something will eventually happen and they'll change the formatting to give you a headache. For example, suppose the data comes in like this:

    <breakfast_menu> <food><name>Belgian Waffles</name><price>$5.95</price> <description>Two of our famous Belgian Waffles with plenty of real + maple syrup</description> <calories>650</calories> </food> <food><name>Strawberry Belgian Waffles</name><price>$7.95 </price><description>Light Belgian waffles covered with strawb +erries and whipped cream </description><calories>900</calories> </food> <food><name>Berry-Berry Belgian Waffles </name> <price>$8.95</price> <description>Light Belgian waffles covered with an assortment o +f fresh berries and whipped cream</description><calories>900</calories> </food> <food> <name>French Toast</name> <price>$4.50</price> <description>Thick slices made from our homemade sourdough brea +d</description> <calories>600</calories> </food> <food> <name> Homestyle Breakfast</name> <price>$6</price> <description>Two eggs, bacon or sausage, toast, and our ever-po +pular hash browns</description> <calories>950</calories> </food> <food><name>Robot Cogs</name><price>$123.456</price></food> <food><name>Berries &amp; More Berries Waffles</name><price>11.5</pric +e></food> </breakfast_menu>

    Here, you'll find several things that can cause you some trouble:

    • Some of the values you're interested in have extra whitespace
    • The prices are formatted differently
    • Tags may not appear on the same line
    • Special characters (such as &) will show up as entity text

    So you'll find that you'll get awful results with your code:

    $ perl pm1208325_proc_xml.pl ugly.xml Homestyle Breakfast 4.50 Berries &amp; More Berries Waffles 123.456 French Toast 8.95 Strawberry Belgian Waffles 5.95

    Notice that due to the ugliness I added to the XML file, the output is not only ugly, but wrong!

    Not only are some items missing from the output, but since you're using separate arrays to keep your values, any parsing error one one of the values makes your arrays get out of synchronization, so the wrong prices appear on some items.

    There are other headaches you can get into when dealing with XML files, too. So you may want to learn one of the XML handling libraries. It's a little bit of a pain at first, but once you're used to it, these sorts of issues just magically go away. Then you can use the time you're not wrestling XML data to handle the other issues, like formatting values!

    I used XML::Twig and whipped something up and it displays:

    $ perl ex_Xml_Twig_pm1208325.pl ugly.xml Belgian Waffles $5.95 Berries & More Berries Waffles $11.50 Berry-Berry Belgian Waffles $8.95 French Toast $4.50 Homestyle Breakfast $6.00 Robot Cogs $123.46 Strawberry Belgian Waffles $7.95

    ...roboticus

    When your only tool is a regular expression, all XML problems look insurmountable.

Re: aligning text
by choroba (Cardinal) on Feb 02, 2018 at 18:02 UTC
    As usually, I present a XML::XSH2 solution. It's a bit ugly as sorting doesn't work the way I'd like it to (I might fix it in the future, though).
    perl { use List::Util qw{ max } } ; open xml.xml ; my $names = /breakfast_menu/food/name/text() ; my $maxlength = {max(map length, @$names)} ; my $foods = /breakfast_menu/food ; for my $food in { sort @$foods } { my $name = $food/name ; my $price = $food/price ; perl { printf "%-${maxlength}s %s\n", $name, $price } ; }

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: aligning text
by LanX (Saint) on Feb 02, 2018 at 15:24 UTC
    Better try using self explaining variable names, it's no fun trying to understand your code like this.

    Since your first print seems* to be the price you might be interested in using printf for floats with a sufficiently large format (i.e. max digits of price, see examples for %f in sprintf )

    The following names should be aligned then.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Wikisyntax for the Monastery

    update

    *) wrong because @har derives from @ar2 and vice versa. (Jesus!)

Re: aligning text
by hippo (Bishop) on Feb 02, 2018 at 15:16 UTC
    but i'm not succeeding

    Since you have not said (or shown) in what way you are not succeeding, we can only point you towards generic solutions such as perlform, printf, Text::Table, Template, etc. Hopefully one or other of those will give you the output you need.

    While you're at it, you might want to consider one of the many XML modules (but not XML::Simple) for parsing the input.

      This is the output:

      Belgian Waffles 5.95 Berry-Berry Belgian Waffles 8.95 French Toast 4.50 Homestyle Breakfast 6.95 Strawberry Belgian Waffles 7.95

      And this is the desired output:

      Belgian Waffles 5.95 Berry-Berry Belgian Waffles 8.95 French Toast 4.50 Homestyle Breakfast 6.95 Strawberry Belgian Waffles 7.95

        Since I doubt anyone else will suggest it, here's a perlform-based solution:

        #!/usr/bin/env perl use strict; use warnings; my @har = ( 'Belgian Waffles', 'Berry-Berry Belgian Waffles', 'French Toast', 'Homestyle Breakfast', 'Strawberry Belgian Waffles', ); my @prices = qw/ 5.95 8.95 4.50 6.95 7.95 /; my ($name, $price); format STDOUT = @<<<<<<<<<<<<<<<<<<<<<<<<< @#.## $name, $price . for my $i (0 ..$#har) { $name = $har[$i]; $price = $prices[$i]; write; }

        For that kind of formatting, I've used Text::Table as suggested by hippo. I'm not going to claim that this is the "best" way of doing, but I'd probably resort to using Text::Table if needed to format data into columns.

Re: aligning text
by Anonymous Monk on Feb 03, 2018 at 08:43 UTC

    I have used the printf function and it is looking better now, Tx all for replying.