Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: aligning text

by roboticus (Chancellor)
on Feb 02, 2018 at 17:07 UTC ( [id://1208334]=note: print w/replies, xml ) Need Help??


in reply to aligning text

Anonymous Monk:

No-one else seems to have mentioned the perils of parsing XML with regular expressions, so I guess I'll do so. It's all fine so long as the XML continues to come in to you formatted as your example, or if you control both ends of the data feed.

However, when dealing with third-party data feeds, at some point, something will eventually happen and they'll change the formatting to give you a headache. For example, suppose the data comes in like this:

<breakfast_menu> <food><name>Belgian Waffles</name><price>$5.95</price> <description>Two of our famous Belgian Waffles with plenty of real + maple syrup</description> <calories>650</calories> </food> <food><name>Strawberry Belgian Waffles</name><price>$7.95 </price><description>Light Belgian waffles covered with strawb +erries and whipped cream </description><calories>900</calories> </food> <food><name>Berry-Berry Belgian Waffles </name> <price>$8.95</price> <description>Light Belgian waffles covered with an assortment o +f fresh berries and whipped cream</description><calories>900</calories> </food> <food> <name>French Toast</name> <price>$4.50</price> <description>Thick slices made from our homemade sourdough brea +d</description> <calories>600</calories> </food> <food> <name> Homestyle Breakfast</name> <price>$6</price> <description>Two eggs, bacon or sausage, toast, and our ever-po +pular hash browns</description> <calories>950</calories> </food> <food><name>Robot Cogs</name><price>$123.456</price></food> <food><name>Berries &amp; More Berries Waffles</name><price>11.5</pric +e></food> </breakfast_menu>

Here, you'll find several things that can cause you some trouble:

  • Some of the values you're interested in have extra whitespace
  • The prices are formatted differently
  • Tags may not appear on the same line
  • Special characters (such as &) will show up as entity text

So you'll find that you'll get awful results with your code:

$ perl pm1208325_proc_xml.pl ugly.xml Homestyle Breakfast 4.50 Berries &amp; More Berries Waffles 123.456 French Toast 8.95 Strawberry Belgian Waffles 5.95

Notice that due to the ugliness I added to the XML file, the output is not only ugly, but wrong!

Not only are some items missing from the output, but since you're using separate arrays to keep your values, any parsing error one one of the values makes your arrays get out of synchronization, so the wrong prices appear on some items.

There are other headaches you can get into when dealing with XML files, too. So you may want to learn one of the XML handling libraries. It's a little bit of a pain at first, but once you're used to it, these sorts of issues just magically go away. Then you can use the time you're not wrestling XML data to handle the other issues, like formatting values!

I used XML::Twig and whipped something up and it displays:

$ perl ex_Xml_Twig_pm1208325.pl ugly.xml Belgian Waffles $5.95 Berries & More Berries Waffles $11.50 Berry-Berry Belgian Waffles $8.95 French Toast $4.50 Homestyle Breakfast $6.00 Robot Cogs $123.46 Strawberry Belgian Waffles $7.95

...roboticus

When your only tool is a regular expression, all XML problems look insurmountable.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1208334]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-04-19 13:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found