sathiya.sw has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I need to do the following, comparing two files with known variations.




Where i should do the comparison of these two files. The variation which i need to incorporate is, in the first field time difference can be 1 sec, or 1 minute, or 1 day. IDfield difference can be, 1 or 2. Duration can be, 5 second difference.


07/01 10:11:30$#$101$#$4 Sec$#$
07/01 10:12:30$#$101$#$8 Sec$#$

Here the time difference allowed up to 2 min, and duration difference allowed is up to 5 sec. So now we need to mark these two lines are same.
One more thing is, both the file may have 25,000 lines each minimum, which i need to compare and display a report.

So, what is the best way to do this ?

Is there any tool, or any module to do this ?

Update: 25000 lines each to 25,000 lines each minimum. And one more question which will be a good delimeter in Perl. I use $#$ in other languages, but here $ mentions scalar, so what is good delimiter here ?
  • Comment on comparing two files, with known variations

Replies are listed 'Best First'.
Re: comparing two files, with known variations
by zwon (Abbot) on Jan 07, 2009 at 10:44 UTC

    Take a look onto File::Compare, there's a compare_text function which allows you to specify callback with line comparison function.

Re: comparing two files, with known variations
by prasadbabu (Prior) on Jan 07, 2009 at 06:13 UTC

    Hi Sathiyamoorthy,

    I don't know is there any tools for that. Since there are only 2500 lines, we can do the job in a normal way.

    1. read one line at a time using 'while' or read and store in 2 arrays + line by line. (open, while, @file1, @file2, '<') 2. 'split' the line with '$#$' in both the arrays/lines. (split, @arra +y1, @array2) 3. In a subroutine compare first, second and third field of the array +1 with first, second and third field of the array2 respectivey, for t +he difference. (for time comparison you can use Date::Manip or Date::Calc module) (&s +ubroutine) 4. write the result in a file to create a report. (open, '>')


      fine on your reply. But i am interested in knowing about the tools / modules to do this Job.
      And also, it is not 2,500 it is 25,000 minimum.
      Thanks for your reply, hence i got some modules for comparison, and other ideas.
        I feel that your worry is about 25000 lines of data into hash or into perl's memory!.
        If my guess is correct, you can use DBM::Deep. But since the data will be stored in a file, it will not be as quick as normal hash.
        Use the *nix sort and diff utilities.

        As far as field separator is concerned, it looks like your file is fixed format; you could just use a space...

        Revised: Sorry, I realize that doesn't handle your definition of time identity.