Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Detect common lines between two files, one liner from shell

by merlyn (Sage)
on Oct 14, 2000 at 08:12 UTC ( [id://36725] : CUFP . print w/replies, xml ) Need Help??

A challenge was made in comp.unix.shells for a short program to detect common lines in both fileA and fileB, printing one copy of each of common lines. Of course, this was a natural one-liner in Perl, but I stumbled across a bizarre way which I though you might appreciate.
perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/' fileA fileB >output

Replies are listed 'Best First'.
Re: (thoughts on) Detect common lines between two files, one liner from shell
by Albannach (Monsignor) on Dec 14, 2000 at 02:36 UTC
    While I can see how Stephen is looking at this as an obfuscation-in-progress, and others are looking at it as just plain obfuscated, this type of thing is a great example of why I love playing around with Perl. Don't get me wrong, I certainly thought merlyn had left out a few lines when I first looked at it, but after a couple of minutes it really started to look beautiful (I'm sick, I know...). In this fine example, merlyn:
    - didn't redefine any defaults
    - didn't use any obscure, poorly documented features
    - didn't even use single-letter variable names
    - heck, it's even full of wasted spaces

    He simply made excellent use of the well-documented default behaviours which even I use every day. It still amazes me that St.Larry (and some Perl elves) thought up all these behaviours which often look odd to me at first, but eventually dovetail together so well it's hard to imagine all of these uses weren't considered. Maybe after I've been here long enough I'll be able to come up with better ways to fit the parts together too.

    I'd like to be able to assign to an luser

      Yeah, it's not an obfuscation by any means. Just a nice way to put together a lot of common features. In fact, I'd take it one step further to fulfill one additional monkey wrench thrown in after that posting was made: what if the line appears more than once in either fileA or fileB or both, but you still want only one copy of the line?

      Well, the answer is just as straightforward. Remove the dollar from the regex! I'll leave that explanation as an exercise to the clever reader. {grin}

      -- Randal L. Schwartz, Perl hacker

      update: Bleh! My mistake, the dollar was added to handle this case! I knew I had needed to deal with multiple hits somehow.

      Remind me never to post again. {grin}

        This should also work for an arbitrary number of files, which AFAIK has no equivalent UNIX command. To show common lines in 4 files: perl -ne 'print if ($seen{$_} .= @ARGV) =~ /32+1+0$/' fileA fileB fileC fileD
Re: Detect common lines between two files, one liner from shell
by stephen (Priest) on Dec 13, 2000 at 23:31 UTC
    To add insult to injury, you could do this:
    perl -ne '@foo[10]="print"; eval( $foo[ substr( ( $seen{$_} .= @ARGV), + -2)])' file1 file2
    Adding just that extra bit of obscurity to make it completely byzantine...


Re: Detect common lines between two files, one liner from shell
by b (Beadle) on Dec 13, 2000 at 22:16 UTC
    Can you explain this?

      I will point to the pieces of documentation from which you can figure it out. I suggest locating it with perldoc, but I will also provide links to site documentation.

      The meaning of the -n and -e switches is explained in perlrun. This also tells you what $_ is during the script. As you scan through files, the contents of @ARGV change. The append is being done in scalar context. In that context @ARGV gives you the number of elements you have. The pattern will match when the hash value ends with "10". The two filenames are on the command line. The output is redirected to a file that you look at.

      The trick is that for the hash value to get a 1 in it, the line must appear in the first file. For it to get a 0 in it, it must appear in the second. It will only match /10$/ on the first occurance in the second file when it already appeared in the first.

        Hey, thanks alot!

        I couldn't understand where the files are read from. There is no <> anywhere and the @ARGV is only the file names.

        The trick with the 0 and 1 is cool.

        Sorry, I'm new to this and I don't have too much time to read the tutorials, but this I still don't understand $seen{$_} I know $_ is the current stream. .= is like adding it at the end. But what is that hash /10$/ ? But this only matches the exact line length it doesn't look for a the same word. what if I want to find a word in both files and print it out on screen? Thanks.
    A reply falls below the community's threshold of quality. You may see it by logging in.