Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

How to run multiple xml in a directory on a single perl script

by Siegfried_13 (Initiate)
on Jun 03, 2014 at 10:20 UTC ( [id://1088377]=perlquestion: print w/replies, xml ) Need Help??

Siegfried_13 has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks, I am a newbie in perl and I would like seek helpon how to parse multiple xml files from a directory using a single perl script. Ex. If I execute the "Perl Clean.pl" in the cmd. All xml files in the same directory will remove the unwanted tags, without affecting its filename or should I say it will retain its filename or its better if I also send it on another directory. Below is my code: Many Thanks!

#!/xce/bin/perl -w use warnings; opendir(DIR, ".") or die "cannot open directory"; @docs = grep(/\.xml$/,readdir(DIR)); foreach $file (@docs) { open (INFILE, "$file") or die "could not open $file\n"; # Process error data to rectify the error records in massaged data while (<INFILE>) { while(/<p>\s*<text>([^\000]*?)<\/text>\s*<\/p>/){ $temp=$1; $temp=~s/<p[^\000]*?\/>//gmi; $temp=~s/<br\/>//gmi; s/<p>\s*<text>[^\000]*?<\/text>\s*<\/p>/<P>\n<TEXT>$temp<\/TEXT>\n +<\/P>/; } s/(<\/?)TEXT/$1text/gm; s/(<\/?)P/$1p/gm; print $_; # Output the contents of $_ to # output file } #Close the infile. close(INFILE); }

Replies are listed 'Best First'.
Re: How to run multiple xml in a directory on a single perl script
by dHarry (Abbot) on Jun 03, 2014 at 11:04 UTC

    Parse an XML file? ==> use an XML parser! E.g. XML::Twig or XML::LibXML

    Using REs is sooner or later going to bite you.

    Cheers

    Harry

Re: How to run multiple xml in a directory on a single perl script
by 2teez (Vicar) on Jun 03, 2014 at 11:39 UTC

    Hi Siegfried_13,
    ..seek helpon how to parse multiple xml files from a directory using a single perl script...

    There are two suggestion I can give here:

    1. Never parse XML using a regex. There are several modules you could use like XML::LibXML and the likes
    2. Since, all the xml files are in the same directory, and there are no sub-directories to check for xml files. Simply, use opendir, readdir get the files ending with .xml and using a for-loop or a while-loop as the case may be.
      I know taint had mentioned using a File::Find, but for me, it will be like killing an ant with a sledged hammer

    All of these can be done in a single perl script.
    Hope this helps.

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
      @2teez:

      Your comment about File::Find may have some merit right now. But my suggestion makes his script FutureProof. Who's to say how many (sub)directories will be used tomorrow, or {...}?
      ++ to you anyway. :)

      --Chris

      ¡λɐp ʇɑəɹ⅁ ɐ əʌɐɥ puɐ ʻꜱdləɥ ꜱᴉɥʇ ədoH

        taint,
        ..But my suggestion makes his script FutureProof..
        I agree, but who knows when? More so, can we take the step to the future one at a time? *wink* ;)

        If you tell me, I'll forget.
        If you show me, I'll remember.
        if you involve me, I'll understand.
        --- Author unknown to me
Re: How to run multiple xml in a directory on a single perl script
by taint (Chaplain) on Jun 03, 2014 at 10:53 UTC
    Greetings, Siegfried_13.

    If I were you, I think I'd look to using File::Find where recursive file manipulation is concerned. In your use case, anyway. I'd also look to chomping the files, to run against your RE's. As I read your example. I don't think you'll get what you're after, unless you do. Using chomp you'll get the files, line-by-line.

    Best wishes.

    --Chris

    ¡λɐp ʇɑəɹ⅁ ɐ əʌɐɥ puɐ ʻꜱdləɥ ꜱᴉɥʇ ədoH

Re: How to run multiple xml in a directory on a single perl script
by RichardK (Parson) on Jun 03, 2014 at 12:43 UTC

    To get a list of all the xml files, I think it's much easier to use glob

    my @docs = glob('dir/*.xml');

    BTW, do check the docs to fully understand how glob works, (paths with spaces may not work as you expect!).

Re: How to run multiple xml in a directory on a single perl script
by sundialsvc4 (Abbot) on Jun 03, 2014 at 14:14 UTC

    Definitely use an XML processing library in Perl to do all of the work involved in parsing the XML files.   I most-often use XML::LibXML because it uses a very commonly-used XML processing library, such that your program might well be reading the file using the same software that the client used to produce it.

    Don’t overlook the possibility of running your Perl program from the command-line with a parameter such as somedir/*.xml, or something along those lines, which would invoke your program multiple times passing one file at a time to it as a command-line parameter.   Another way to do it, but sometimes a very useful one.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1088377]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-04-19 06:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found