comparing two files, with known variations

sathiya.sw has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I need to do the following, comparing two files with known variations.

Format

file-1:
DateWithTime$#$IDfield$#$Duration$#$

file-2:
DateWithTime$#$IDfield$#$Duration$#$

Where i should do the comparison of these two files. The variation which i need to incorporate is, in the first field time difference can be 1 sec, or 1 minute, or 1 day. IDfield difference can be, 1 or 2. Duration can be, 5 second difference.

Example:

file-1:
07/01 10:11:30$#$101$#$4 Sec$#$
file-2:
07/01 10:12:30$#$101$#$8 Sec$#$

Here the time difference allowed up to 2 min, and duration difference allowed is up to 5 sec. So now we need to mark these two lines are same.
One more thing is, both the file may have 25,000 lines each minimum, which i need to compare and display a report.

So, what is the best way to do this ?

Is there any tool, or any module to do this ?

Update: 25000 lines each to 25,000 lines each minimum. And one more question which will be a good delimeter in Perl. I use $#$ in other languages, but here $ mentions scalar, so what is good delimiter here ?

Sathiyamoorthy

Comment on comparing two files, with known variations

Replies are listed 'Best First'.
Re: comparing two files, with known variations by zwon (Abbot) on Jan 07, 2009 at 10:44 UTC
Take a look onto File::Compare, there's a `compare_text` function which allows you to specify callback with line comparison function.	[reply]
Re: comparing two files, with known variations by prasadbabu (Prior) on Jan 07, 2009 at 06:13 UTC
Hi Sathiyamoorthy, I don't know is there any tools for that. Since there are only 2500 lines, we can do the job in a normal way. 1. read one line at a time using 'while' or read and store in 2 arrays + line by line. (open, while, @file1, @file2, '<') 2. 'split' the line with '$#$' in both the arrays/lines. (split, @arra +y1, @array2) 3. In a subroutine compare first, second and third field of the array +1 with first, second and third field of the array2 respectivey, for t +he difference. (for time comparison you can use Date::Manip or Date::Calc module) (&s +ubroutine) 4. write the result in a file to create a report. (open, '>') [download] Prasad	[reply] [d/l]
Re^2: comparing two files, with known variations by sathiya.sw (Monk) on Jan 07, 2009 at 06:51 UTC
fine on your reply. But i am interested in knowing about the tools / modules to do this Job. And also, it is not 2,500 it is 25,000 minimum. Thanks for your reply, hence i got some modules for comparison, and other ideas. Sathiyamoorthy	[reply]
Re^3: comparing two files, with known variations by targetsmart (Curate) on Jan 07, 2009 at 07:11 UTC
Hi, I feel that your worry is about 25000 lines of data into hash or into perl's memory!. If my guess is correct, you can use DBM::Deep. But since the data will be stored in a file, it will not be as quick as normal hash.	[reply]
Re^3: comparing two files, with known variations by apl (Monsignor) on Jan 07, 2009 at 12:33 UTC
Use the nix sort and diff utilities. As far as field separator is concerned, it looks like your file is fixed format; you could just use a space... Revised:* Sorry, I realize that doesn't handle your definition of time identity.	[reply]

Back to Seekers of Perl Wisdom