Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Efficiency and Large Arrays

by chromatic (Archbishop)
on Jul 23, 2000 at 02:43 UTC ( [id://23939]=note: print w/replies, xml ) Need Help??


in reply to Efficiency and Large Arrays

The bit about 'increment serial number until it's unique' throws a red flag for me. There are two solutions that come to mind. Either put things in a nice relational database (especially one that has a feature like auto-increment id column) or find some other sort of unique identifier.

If you're creating a new hash already, you can use its reference (yes, you read that right) as a unique ID. (I've seen this used as keys for a Flyweight object, pretty cool!) They're guaranteed to be unique, as they have something or other :) to do with memory locations:

]$ perl my $h_ref = {}; print "$h_ref\n"; HASH(0x80d761c)
You can get rid of everything except the hex digits with a simple tr/// statement: tr/a-f0-9//dc;. That's quicker than scanning for unique numbers.

Still, there's something I can't quite put my finger on here... perhaps you could show us your intended data structure?

Replies are listed 'Best First'.
RE: Re: Efficiency and Large Arrays
by fundflow (Chaplain) on Jul 23, 2000 at 03:22 UTC
    This is a real overkill, don't you think?

    Also, what happens when you run the script the second time?
    Who guarantees that the number you got (memory location) does not appear already somewhere else in the next records?

    If he renumbers anyway, then any number can do. Instead of using perl's heap pointer, it will be easier to explicitly pick one.

    Anyway, his problem seems to lie on the memory use more than the numbering scheme.
      Also, what happens when you run the script the second time?

      Yes, that's a problem, if the serial numbers need to be maintained. If this is a one-time-per-dataset operation, and the serial numbers are there just while manipulating the data, it doesn't really matter.

      Anyway, his problem seems to lie on the memory use more than the numbering scheme.

      But the reason he's keeping all the old records around is to make sure he doesn't reuse a number. If he uses a unique identifier (the reference value is unique, automatically generated, and readily available), he doesn't have to keep all of the records around in memory.

      The thing that bothered me was using grep to look for already-used phone numbers. What if they were the primary key of the hash? Then, it's a simple lookup to see if one's already used.

        Having a hash instead of grepping is of course better. (although it takes more memory)

        The idea of using the memory reference returned by perl's internal heap mechanism is interesting, but i'm not sure it buys much here.

        Anyway, the original post is "walking on the edge" of usability. If his files are much bigger than the computer memory, then the hash will not fit in and then there are better ways, such as using database, doing multiple passes etc. (or keeping the files clean in the first place...)

        Cheers.
Re^2: Efficiency and Large Arrays
by diotalevi (Canon) on Dec 12, 2002 at 14:51 UTC

    Instead of using the string form of the hash reference just take the numeric to begin with. If you wanted the hex form then just pack/sprintf it.

    ]$ perl my $h_ref = {}; print 0+$h_ref,"\n"; print unpack('H*',pack('L',0+$h_ref)),"\n"; 135099932 80D761C

    __SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://23939]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2024-04-24 20:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found