http://qs321.pair.com?node_id=46453


in reply to Detect common lines between two files, one liner from shell

Can you explain this?

  • Comment on Re: Detect common lines between two files, one liner from shell

Replies are listed 'Best First'.
Re (tilly) 2: Detect common lines between two files, one liner from shell
by tilly (Archbishop) on Dec 13, 2000 at 23:01 UTC
    I will point to the pieces of documentation from which you can figure it out. I suggest locating it with perldoc, but I will also provide links to site documentation.

    The meaning of the -n and -e switches is explained in perlrun. This also tells you what $_ is during the script. As you scan through files, the contents of @ARGV change. The append is being done in scalar context. In that context @ARGV gives you the number of elements you have. The pattern will match when the hash value ends with "10". The two filenames are on the command line. The output is redirected to a file that you look at.

    The trick is that for the hash value to get a 1 in it, the line must appear in the first file. For it to get a 0 in it, it must appear in the second. It will only match /10$/ on the first occurance in the second file when it already appeared in the first.

      Hey, thanks alot!

      I couldn't understand where the files are read from. There is no <> anywhere and the @ARGV is only the file names.


      The trick with the 0 and 1 is cool.


      Sorry, I'm new to this and I don't have too much time to read the tutorials, but this I still don't understand $seen{$_} I know $_ is the current stream. .= is like adding it at the end. But what is that hash /10$/ ? But this only matches the exact line length it doesn't look for a the same word. what if I want to find a word in both files and print it out on screen? Thanks.
        I refuse to repeat the documentation until you have at least tried to read it. That is why it is there and it is faster for both of us if you take advantage of it.

        As for your additional question, the -n option is an implicit loop over the lines in both files. If you want to do words, then within each line you would need to loop over the words as well. But the same logic would work. (OTOH the algorithm will get rather inefficient. But oh well.)

A reply falls below the community's threshold of quality. You may see it by logging in.