Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: XML Parsing,

by Sewi (Friar)
on Sep 09, 2009 at 10:48 UTC ( [id://794320]=note: print w/replies, xml ) Need Help??


in reply to XML Parsing,

I don't know which error you get or what isn't happening, but I'ld like to give some comments on your code:
#!/usr/bin/perl use XML::Simple; print "Content-type: text/html\n"; print "\n"; ######XML PARSE########## $ipath = "e:/cars/"; $str = ".xml"; opendir THEDIR, "$ipath"; @xmlfile = grep (/$str/, readdir THEDIR); closedir THEDIR; foreach $file (@xmlfile){ $xml = $ipath.$file; }
This bracket above confuses me. At this place, it would make the loop go through the directory getting a random filename (the last one readdir returns, but you can't say which one would be the last returned). The order could change over time, reboots, fsck's, OS-changes, directory copies...

Okay, I'll assume that you know what you're doing :-)

if (lc($xml) =~ "don"){&doncaster} else { if (lc($xml) =~ "ayl"){$holding = "asmholding"}; if (lc($xml) =~ "cartrans"){$holding = "ctpholding"}; if (lc($xml) =~ "hills"){$holding = "hilholding"}; if (lc($xml) =~ "windley"){$holding = "wlsholding"}; if (lc($xml) =~ "ppm"){$holding = "ppmholding"};
The following processing only takes for file with doesn't include don.
Using elsif for the last if-block might be better.
print "Other Cars<br>"; $config = XMLin( $xml, SuppressEmpty => "" ); $exportid = "$config->{Header}->{ExportID}"; $numveh = "$config->{Summary}->{NumberOfVehicles}"; $memid = "$config->{Header}->{MemberID}";
exportid and memid are being used, but not modified and numveh isn't used at all. You may want to use their long form (config-header-exportid, like you wrote) instead of copiing the values without need.
$lineno = 10; $vehicleno = 1; foreach $vehicle ( @{ $config->{'Vehicle'} } ) { $auctionid = "$vehicle->{AuctionID}"; $make = "$vehicle->{Manufacturer}"; $model = "$vehicle->{Model}"; [...]
Same as above, there is no need to copy the values, it just makes your script much longer than needed.
$image1 = "$vehicle->{Images}->{Image_1}"; $image2 = "$vehicle->{Images}->{Image_2}"; $image3 = "$vehicle->{Images}->{Image_3}"; $image4 = "$vehicle->{Images}->{Image_4}"; $image5 = "$vehicle->{Images}->{Image_5}"; $image6 = "$vehicle->{Images}->{Image_6}"; $image7 = "$vehicle->{Images}->{Image_7}"; $image8 = "$vehicle->{Images}->{Image_8}"; $image9 = "$vehicle->{Images}->{Image_9}"; $image10 = "$vehicle->{Images}->{Image_10}"; $image11 = "$vehicle->{Images}->{Image_11}"; $image12 = "$vehicle->{Images}->{Image_12}";
Besides the previous comment, this is a wonderful loop example. Here is my suggestion:
for my $ImgNo (1..12) { # Storing them into an array is easy: $image[$_] = $vehicle->{Images}->{'Image_'.$ImgNo}; # or you could also skip empty values: push @image,$vehicle->{Images}->{'Image_'.$ImgNo} if $vehicle->{Imag +es}->{'Image_'.$ImgNo} ne ''; # or you may want to keep the variable names: eval('$image'.$ImgNo.'=$vehicle->{Images}->{"Image_".$ImgNo};'); }
This makes three lines which were 12 before.
###########Write CSV############ $opath = "e:/out/" ; $add='0'; $add2='xx.csv'; $add5 = $exportid.$lineno ; $add3 = $opath.$exportid.$lineno.$add2 ; $add6 = "xx"; $filename = $exportid.$lineno.$add2; $mileage =~ tr/,//d ; $damage =~ tr/,//d ; $trim =~ tr/,//d ; open (FILENAME,">$add3");
Just a text issue: I prefer calling it FILEHANLDE, because FILENAME is not what is inside, but it doesn't make any difference to Perl...
print FILENAME "Ref No,Make,Model,Trim,Vehicle Sub Class,Colour,Engine + Size,Fuel Type,Transmission,Year,Mileage,Registration No,Keys Suppli +ed,Damage Report,VAT Applicable,Condition,FSH,ABI Category,Reserve,Ch +assis No,Vehicle Source,Agent,Region,Vehicle Location,Source Name,Dat +e Approved\n"; print FILENAME "$add5,$make,$model,$trim,$doors,$colour,$cc,$fueltype, +$transpeed $trantype,$year,$mileage,$reg,$keyssupplied,$damage,$hasva +t,$starts $drives,No,$abicat,$reserve,na,$memid,$memid,$memid,$memid, +$auctionid,na,\n";
This last print could be made much more readable:
print FILENAME join(',', $exportid.$lineno, #$add5 is defined this way some lines before $vehicle->{Manufacturer}, $vehicle->{Model}, [...] @image, )."\n";
This one uses join and the original names for the fields.
close FILENAME ; print "<table width=400>"; print "<tr><td>$vehicleno. "; print "$filename Created </td>"; #############Change IMG Name############## $pga ="a.jpg"; $pgb ="b.jpg"; $pgc ="c.jpg"; $pgd ="d.jpg"; $pge ="e.jpg"; $pgf ="f.jpg"; $pgg ="g.jpg"; $pgh ="h.jpg"; $pgi ="i.jpg"; $pgj ="j.jpg"; $pgk ="k.jpg"; $pgl ="l.jpg"; if ($image1 ne ""){$jpg1 ="$add5$add6$pga"; rename ($ipath.$image1, $ipath.$jpg1); print "<td colspan=2>Image 1 Renamed</td></tr>"; }
Here we could you really cool Perl features:
$Char = "a"; for (@images) { next if $_ eq ''; # if you didn't filter them above rename $ipath.$_,$ipath.$add5.$add6.($Char++).'.jpg'; }
First, I defined a variable for the current char, the first one is "a".
I assumed that you switched to an array before which is easier to work with. This loop goes through the array and (next line) skips empty values if there is a risk that they may make it into the array.
The next line is using the current loop item $_ and renaming it to $Char.jpg
Like (nearly) any programming language, Perl could count up, but in Perl this also works for letters. The "a" defined before is counted up to a "b" here, but - and this is why the ++ are BEHIND the variablen name - the old value is returned and used in the loop run. Pretty cool, isn't it?
I skiped the print, but I guess that you could handle this yourself...

Let's stop here as I have nothing to say about the rest.

One thing left for reading directories:

opendir THEDIR, "$ipath"; for $file (grep (/$str/, readdir THEDIR)) { # Do the work } closedir THEDIR;
This is shorter and you save the creattion of a temporary array.

Don't worry, I also wrote code like this when I was a Perl beginner. Please don't treat this as critic, I'm very happy that people are still learning Perl! This post should be a review of things you could do to make your code more readable and easier to maintain.

Replies are listed 'Best First'.
Re^2: XML Parsing,
by dmsparts (Sexton) on Sep 09, 2009 at 12:26 UTC
    Hi, thanks for the reply, as you have pointed out, i do not know a lot of perl, hence why there is a lot of repetition. a little more info (that i should have said before). I am sent a zip file with lots of jpgs and one xml file. i extract them to the folder E:\cars on my pc and then run the script. usually this works fine, renames all the images and then creates a csv file for each vehicle. The problem i have is that when i try to process an xml file that has details of only one vehicle in, ie
    <?xml version="1.0"?> <!--DMS Auction Export - Version 1.2--> <root> <Header> <MemberID>005</MemberID> <ExportID>7908</ExportID> </Header> <Vehicle> <AuctionID>21261</AuctionID> <VehicleID>149611</VehicleID> <Ref></Ref> <Manufacturer>FORD</Manufacturer> <Model>FIESTA</Model> <RegNo>EA05BYM</RegNo> <RegYear>2005</RegYear> <Colour>Red</Colour> <Fuel>Petrol</Fuel> <Damage></Damage> <Doors>3</Doors> <Body>Hatchback</Body> <CC>1999</CC> <Speedo></Speedo> <TransSpeed>5 Speed</TransSpeed> <TransType>Manual</TransType> <TrimLevel>CLOTH</TrimLevel> <Engine></Engine> <Cat>C</Cat> <ReservePrice>2565</ReservePrice> <StartPrice>0</StartPrice> <HasVAT>0</HasVAT> <Keys>1</Keys> <Starts>0</Starts> <Drives>0</Drives> <Stereo>0</Stereo> <VINPlate>1</VINPlate> <LogBook>0</LogBook> <DateAuction>27.03.2009</DateAuction> <Location>BURSCOUGH</Location> <CanBeViewed>YES</CanBeViewed> <CostExVAT>0</CostExVAT> <PAV>0</PAV> <Images> <Image_1>650161.jpg</Image_1> <Image_2>650162.jpg</Image_2> <Image_3>650163.jpg</Image_3> <Image_4>650164.jpg</Image_4> <Image_5>650165.jpg</Image_5> <Image_6></Image_6> <Image_7></Image_7> <Image_8></Image_8> <Image_9></Image_9> <Image_10></Image_10> <Image_11></Image_11> <Image_12></Image_12> </Images></Vehicle> <Summary> <NumberOfVehicles>1</NumberOfVehicles> <DateExport>19.03.2009 14:12:50</DateExport> </Summary> </root>
    the script stops. it doesn't give an error. it prints "other cars" than nothing else. i have determined that the script stops at the point where the XML fields are read into perl variables. making print statements at various points. It seems to be that because there is only one item in the foreach loop it is not going to work. I will also look at your suggestions, however i do not understand some of them. Thanks Again Michael
      the script stops. it doesn't give an error. ...

      That's odd... when I ran the OP script on the xml file in your reply, I got an error message:

      Not an ARRAY reference at 794307.pl line 46.
      In my copy of the script, line 46 was this one:
      foreach $vehicle ( @{ $config->{'Vehicle'} } ) {
      Apparently, when the xml file contains only one "Vehicle", the default behavior of XML::Simple is to provide you with something that is not an array reference (e.g. maybe it's a reference to a hash instead of a reference to an array of hashes).

      That is why moritz, in the first reply, suggested that you include the ForceArray=>1 parameter when you invoke XMLin to create your $config object, like this:

      $config = XMLin( $xml, ForceArray => 1, SuppressEmpty => "" );
      When I made that change and ran it again, I didn't get the error message, and I got more output in addition to just Other Cars<br>.

      Anyway, as others have pointed out, there seem to be quite a few points where your code could use some improvement, and where it may even be doing things you don't intend (and/or not doing things you meant to do). Using more data structures, more loops, and better indentation will help a lot.

      (updated to fix a mistake in wording)

      Hi Michael,

      please ask for everything you didn't understand. I think you're on a very good way and I don't want to stop you because of too short or bad written comments.

      Now I understand, why you're only parsing one xml file. Here is a really complicated, but short solution for this:

      opendir THEDIR, "$ipath"; $xml = $ipath.(grep (/$str/, readdir THEDIR))[0]; closedir THEDIR;
      It is your grep (which returns an array of matching items) but as you only expect one element in the array, the grep-result is forced to be an array using the ( ) around grep. The combination of ( ) and &91; &93; allows you to fetch a specific array item without writing it to an array first.

      Another comment on your original article already told you that you might not get a array from the XML parser if only one inside object is found. There was a hint for a solution. To check this, you should print the thing you use as a array reference:

      print STDERR $config->{'Vehicle'}."\n"; foreach $vehicle ( @{ $config->{'Vehicle'} } ) {
      If this isn't an array reference (shown as ARRAY(0x12345)), you could force it to be one:
      $config->{'Vehicle'} = [$config->{'Vehicle'}] if ref($config->{'Vehicl +e'}) ne 'ARRAY';
      ref() tests if the parameter is a reference and returns the type of reference (in this case, ARRAY). If your value isn't what you expect (an ARRAY - reference), it creates one and fills in the old value into the new array.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://794320]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2024-03-28 14:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found