Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

comparing two files, with known variations

by sathiya.sw (Monk)
on Jan 07, 2009 at 04:48 UTC ( #734553=perlquestion: print w/replies, xml ) Need Help??

sathiya.sw has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I need to do the following, comparing two files with known variations.




Where i should do the comparison of these two files. The variation which i need to incorporate is, in the first field time difference can be 1 sec, or 1 minute, or 1 day. IDfield difference can be, 1 or 2. Duration can be, 5 second difference.


07/01 10:11:30$#$101$#$4 Sec$#$
07/01 10:12:30$#$101$#$8 Sec$#$

Here the time difference allowed up to 2 min, and duration difference allowed is up to 5 sec. So now we need to mark these two lines are same.
One more thing is, both the file may have 25,000 lines each minimum, which i need to compare and display a report.

So, what is the best way to do this ?

Is there any tool, or any module to do this ?

Update: 25000 lines each to 25,000 lines each minimum. And one more question which will be a good delimeter in Perl. I use $#$ in other languages, but here $ mentions scalar, so what is good delimiter here ?
  • Comment on comparing two files, with known variations

Replies are listed 'Best First'.
Re: comparing two files, with known variations
by zwon (Abbot) on Jan 07, 2009 at 10:44 UTC

    Take a look onto File::Compare, there's a compare_text function which allows you to specify callback with line comparison function.

Re: comparing two files, with known variations
by prasadbabu (Prior) on Jan 07, 2009 at 06:13 UTC

    Hi Sathiyamoorthy,

    I don't know is there any tools for that. Since there are only 2500 lines, we can do the job in a normal way.

    1. read one line at a time using 'while' or read and store in 2 arrays + line by line. (open, while, @file1, @file2, '<') 2. 'split' the line with '$#$' in both the arrays/lines. (split, @arra +y1, @array2) 3. In a subroutine compare first, second and third field of the array +1 with first, second and third field of the array2 respectivey, for t +he difference. (for time comparison you can use Date::Manip or Date::Calc module) (&s +ubroutine) 4. write the result in a file to create a report. (open, '>')


      fine on your reply. But i am interested in knowing about the tools / modules to do this Job.
      And also, it is not 2,500 it is 25,000 minimum.
      Thanks for your reply, hence i got some modules for comparison, and other ideas.
        I feel that your worry is about 25000 lines of data into hash or into perl's memory!.
        If my guess is correct, you can use DBM::Deep. But since the data will be stored in a file, it will not be as quick as normal hash.
        Use the *nix sort and diff utilities.

        As far as field separator is concerned, it looks like your file is fixed format; you could just use a space...

        Revised: Sorry, I realize that doesn't handle your definition of time identity.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://734553]
Approved by planetscape
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2020-10-28 12:21 GMT
Find Nodes?
    Voting Booth?
    My favourite web site is:

    Results (260 votes). Check out past polls.