Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Somethings not mentioned yet

by gryng (Hermit)
on Jul 23, 2000 at 09:23 UTC ( [id://23967]=note: print w/replies, xml ) Need Help??


in reply to Efficiency and Large Arrays

A few tips that haven't been mentioned (and are also not to be considered complete).

First, you didn't say if your data was ordered or not. If it happens to be ordered by either feild, then you do not need to put much effort at all into that dup check. Just keep the last item found. That will be all you need to check for to see if the next item is a duplicate.

Of course if they are not ordered, then this will not be as good of a solution. You should still consider it though. For instance, if the files are semi-ordered, that is, there may be about 5-10% mis-ordering, but otherwise it's in the right order, then you can still use the same routine, but instead you use the last field as a water mark sort of value -- that is, if you come across a value that is lower, it gets set to ++$water_mark.

Also, if the files are completely not ordered, you may want to simply sort them beforehand, this initial cost can easily outway the memory cost of your hash.

Another, much simpler, method is to completely get rid of the serial numbers in the file and just start at 0 or 1 for the first record and count up. This only works if you don't care about your serial numbers changing each time you run this program. This is good because then you can also use the previous mentioned idea of sorting by phone number to do your dup-check for phone numbers, and then avoid the dup-check on serials by simply making up your own.

Like I said, many caveats. But depending on what you are doing, these can really speed things up.

Oh, one other things, if you have multiple files, and you sort beforehand you can keep your files separate, but you'll need to open all the files up at once so that you can read from the current lowest one.

Ciao,
Gryn

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://23967]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2024-03-29 09:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found