The stupid question is the question not asked | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
I think your two-pass approach is fine in principle. Because you can't know in advance if a record has duplicates, you'll have to keep the ID of all records in memory just in case, in a single-pass approach. Whether this is feasible, it depends on the expected size of the file. A suggestion for a single-pass approach would be to make $dup{key} an array-ref:
Update: I failed to see that you read-in the whole file in @lines to begin with. Code modified to avoid this. My comment about single-pass / two pass becomes a bit irrelevant in the new light. MoreThis means that I first go through the file once to detect duplicates, and then go through the file again once for each duplicate found. I can't help but think that there is a more elegant and efficient way of doing things. My code is shown below: This confused me at first, because I didn't read your code carefully. Actually, in your original code, you don't go through the file twice (in I/O terms). You actually read the whole file line by line into an array, then loop over that array twice, populating a hash in the first pass. My solution also goes through the file only once (in a while loop), populating a deep data structure (hash of arrays), then, in a second loop, it goes over the elements of that hash printing those with more than one ID. As for writing the whole program in a single loop it's not possible, because you have to basically group-by. Random_Walk above cheats by assuming there can be a maximum of a single duplicate for any given textual key. In reply to Re: I sense there is a simpler way...
by calin
|
|