Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Here's how I'd do it (for clarity, this was basically suggested in the first reply) - code untested :
use strict; use warnings; use Tie::Hash::Indexed; tie my %lines1, 'Tie::Hash::Indexed'; # gives you the ordered hash open my $IN1, '<', "tmp12" or die "Cannot open this file: $! +"; open my $IN2, '<', "donor_82_01.csv" or die "Cannot open this file: $? +"; # step 1, cache contents of $IN1 (read the first file once) # populate %lines1 "cache" for my $item1 (<$IN1>) { @tmp1 = split( /\t+/, $item1 ); $lines1{ $tmp[1] } = \@tmp1; # save full $item1 line, keyed on +$tmp[1] } # step 2, iterate over contents of $IN2 / look up in %lines1 to compar +e open my $OUT, '>', "tmp12_01" or die "Cannot open this file: $?"; LOOKUP_AND_COMPARE: for $item2 (@lines2) { #chomp $item2; # not needed, see last line my @tmp2 = split( /\,+/, $item2 ); # -- look up if ( 'ARRAY' eq $lines1{ $tmp2[0] } ) { my @tmp1 = @{ $lines1{ $tmp2[0] } }; # for clarity, not act +ually needed; can get value via "$lines1{ $tmp2[0] }->[0]" print $OUT $tmp1[0], ",", $item2; #<-updated to fix + bareword from old code last LOOKUP_AND_COMPARE; } } #print $OUT "\n"; # probably don't need if you don't "chomp $it +em2"

Additional optimizations, depending on your constraint (timeversus space):

  • if time, cache the larger of the 2 files
  • if space, cache the smaller of the 2 files

The lesson here, as stated below is to not nest your loops. It's called "computational complexity". Basically only want to have at most 1 level of looping. The line, if ( 'ARRAY' eq $lines1{ $tmp2[0] } ) { is the "constant time" look up capability that is being provided for by the ordered caching of the first file above and how you avoid the inner loop.


In reply to Re: match two files by perlfan
in thread match two files by yueli711

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-03-29 11:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found