Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Comparing unordered but similar data files

by gwam (Initiate)
on Sep 22, 2010 at 23:40 UTC ( [id://861408]=perlquestion: print w/replies, xml ) Need Help??

gwam has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks. I'm new to Perl and having a spot of bother. I'm trying to read in two data files which contain identical data but the data is out of order. I need to make sure the data from both files is in fact identical despite the order. The code I have so far is as follows:
#!/usr/bin/perl use strict; open (FILE, "exp.log") || die; my @array = <FILE>; close(FILE); print "\n\n\nExpected log read and stored"; open (FILE2, "actual.log") || die; my @array2 = <FILE2>; close(FILE2); print "\n\n\nActual log read and stored\n\n\n"; my @sorted = sort(@array); my @sorted2 = sort(@array2); if(@sorted eq @sorted2) { print "Success! Actual log and expected log contain the same data\ +n\n";} else { print "Failure! Actual log and expected log contain different dat +a\n\n"; } #print "@sorted\n"; print "\n\n\n\n\n\n\n\n\n-------------------------------------\n\n\n\n +\n\n"; #print "@sorted2\n"; exit;
a sample from one of the data files is as follows:
.MAMACACHE_REGRESSION.AGT.M Type: INITIAL Status OK MdMsgType | 1 | U8 | 1 MdMsgStatus | 2 | U8 | 0 MdSeqNum | 10 | U32 | 0 MamaAppMsgType | 18 | U8 | 0 MamaSenderId | 20 | U64 |6991275514488954478 .MAMACACHE_REGRESSION.BCO.M Type: INITIAL Status OK MdMsgType | 1 | U8 | 1 MdMsgStatus | 2 | U8 | 0 MdSeqNum | 10 | U32 | 0 MamaAppMsgType | 18 | U8 | 0 MamaSenderId | 20 | U64 | 6991275514488954478


The idea I had was to read in the two files, sort them (thus making the out of order a non issue)then compare the newly sorted array. However the problem I'm having is when i make subtle changes to one of the files (for eg changing a number from 20 to 19) the script still believes the two files are equal and only denotes a "failure" if the one of the text files has more lines introduced. Hope this makes sense. All the best!

Replies are listed 'Best First'.
Re: Comparing unordered but similar data files
by Kanji (Parson) on Sep 23, 2010 at 00:37 UTC

    Your use of eq is actually comparing the number of elements in each array, which is why you only see a failure if one file has more lines than the other.

    If you're using Perl 5.10 or newer, you can achieve what you want using ~~ (the smart match operator) instead:-

    if (@sorted ~~ @sorted2) {

    If you're using Perl 5.8 or older, you'll need to compare the arrays element by element, an example of which you can find in perlfaq4 - How do I test whether two arrays or hashes are equal?.

        --k.


      Thanks alot Kanji! That done the trick. I figured it was something like that. I thought i was going to need a foreach loop to access and compare each element in the arrays. awesome that Perl has that smart match operator. Thanks again you saved me alot of time.
Re: Comparing unordered but similar data files
by perlpie (Beadle) on Sep 23, 2010 at 00:40 UTC

    monks: gather your stones while I don my armor...

    maybe perl isn't the best tool here

    $ cat abc a b c d e e e ff fff g $ cat abc2 a fff e g e e b c d ff $ sort abc > sorted_abc $ sort abc2 > sorted_abc2 $ diff -u sorted_abc sorted_abc2 $ rm sorted*

    Then edit one of the files and repeat. You'll get a diff. In the above case no diff is shown because the files are the same when sorted.

      If shell, I'd bash it like this:
      comm -3 <(sort abc) <(sort abc2)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://861408]
Approved by Kanji
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-25 17:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found