Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Using diff in perl

by tsk1979 (Scribe)
on Apr 05, 2006 at 05:52 UTC ( [id://541284]=perlquestion: print w/replies, xml ) Need Help??

tsk1979 has asked for the wisdom of the Perl Monks concerning the following question:

I am developing a program which diff's multiple sets of files. The diff is not simple diff but conditional diff. I will be using constructs like "Ignore lines in which the third word is "WARNING". I have the option of using shell diff with ignore-maching-lines="regexp" also but I want to do it in perl as it makes post processing of data easier. searching cpan brought up 2 modules Algorithm::diff and Text::diff. Which one will be more suitable for my application? I will be working with files. Could you point me to examples where these have been used? I also have the options of reading the files into arrays and using array::diff, but some of these files can be large(10MB or so too!).

Replies are listed 'Best First'.
Re: Using diff in perl
by zer (Deacon) on Apr 05, 2006 at 06:55 UTC
    I've looked at your post a few times with a bit of confusion. I am not entirely sure what you are trying to do, maybe an example or two would help?
    As for the 2 modules, Text::Diff inherits Algorithm::Diff so that might be the best route to go. Although it is slower than the GNU Diff (with large files) which depending on if your system has it or not it might be best to shell it out.
      My script will work on the results of running a testcase The testcase run creates a few data file. We compare the data files with the data.gold files which have been created. So all this script will do it will do a diff between the data file created and the data.gold file. It will ignore lines like "Testcase ran on server x" etc., or WARNING, logs will be created in directory y. this way we diff only the relevent data while ignoreing a preset data. Earlier we had one data file to be compared with data.gold file So all I did was a diff data.gold data --ignore-matching-lines="regexp". Now I will compare multiple files and make a composite DIFF file containing the differences.
        I'm wondering if you shouldn't try using a plain diff tool, and postprocess the result with Perl. I personally find the output of diff -u very easy to parse.
Re: Using diff in perl
by philcrow (Priest) on Apr 05, 2006 at 13:54 UTC
    If you go back to a pure perl approach, you could look in Test::Files for examples of using Algorithm::Diff. <discloure>I wrote Test::Files</disclosure>

    You probably don't want to use Test::Files, since it doesn't provide control over the output format. Although, it does have filters which let you modify the input before it is compared (like your example of ignoring lines with WARNING in the third position).

    Phil

Re: Using diff in perl (Alg::Diff)
by tye (Sage) on Apr 06, 2006 at 06:26 UTC

    Text::Diff just uses Algorithm::Diff under the covers but returns the differences as text (usually written to STDOUT, I'd guess). Array::Diff is a subroutine of only about a dozen lines using Algorithm::Diff. So just use Algorithm::Diff.

    Algorithm::Diff only works on arrays. It actually inserts each line of one of the files into a hash. So, if you've got a huge file, it needs a pretty big hash. The nature of the algorithm nearly requires things be done this way. But 10MB isn't so huge so I doubt you'll have a problem with running out of memory. Though, the number of lines per file might be large enough that the Perl implementation of the 'diff' algorithm (longest common subsequence) might be too slow for your purposes (I have it on my to-do list to make integration with the XS implementation of this algorithm, Algorithm::LCS, supported by Algorithm::Diff so that it uses it if you have it installed).

    Your node also prompted me to look again at ways to reduce the amount of memory required by Algorithm::Diff. I think I see some fairly simple ways to dramatically reduce memory requirements so I'm breaking open the module to start working in a new release to make it support doing things much faster with less memory. Wish me luck. (:

    - tye        

      All the best :) Anyways, I have decided to use the GNU diff for now, and using perl to post process the text. I guess unless there is a way of natively doing in perl what GNU diff does, its best to use that for advanced things.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://541284]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (4)
As of 2024-03-29 09:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found