Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Check randomly generated numbers have not been used before

by R3search3R (Initiate)
on Jun 20, 2014 at 13:30 UTC ( [id://1090630]=perlquestion: print w/replies, xml ) Need Help??

R3search3R has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I am trying to create a 7 digit random part number generator and have each number written to a .txt. This has been successful. However, I am now struggling to perform a test to check whether these random 7 digit numbers have been used before (by checking them against the used numbers in the text document). If it has i would like it to be discarded and then a new unique number genereated and checked. Can someone please help me with the code that would allow me to do this. Thanks

use warnings; use strict; use diagnostics; ############### OPEN FILES ################## my $database = "C:/Database.txt"; open (INPUT, ">>" , $database) || die "Can't open $database: $!"; ############## PROCESS DATA ################# my $part_number_range = 10000000; my $part_number = int(rand($part_number_range)); my $new_part_number = sprintf("%07d", $part_number); print INPUT "$new_part_number\n"; }

Replies are listed 'Best First'.
Re: Check randomly generated numbers have not been used before
by toolic (Bishop) on Jun 20, 2014 at 13:39 UTC
    You can read all the numbers from your file and store them into a hash. Then, you can check if a random number exists in your hash. If you show us the format of your file, we can provide further guidance.
Re: Check randomly generated numbers have not been used before
by perlfan (Vicar) on Jun 20, 2014 at 14:02 UTC
    This could become extremely prohibitive if you suppose you may be generating even a small fraction of all possible 7 digit numbers.

    If you need to create a GUID or UUID, there are better ways to go about this if your constraint can be relaxed to your numbers having a very, very low chance of collision (and such modules exist for this purpose.)

    If you absolutely must track these numbers persistently or over a long running process, I'd recommend using SQLite to store your numbers. In the simple table that stores your number, simply mark this column as unique. INSERTing them into the table will let SQLite to efficiently check to make sure this field value is unique . Failure to insert due to a DUPLICATE KEY error could be easily caught, and this would allow you to "try again."

    Finally, if you must maintain your own "index" of assigned numbers, I'd recommend storing it in a Trie for efficient storage and fast inserts & lookup.

Re: Check randomly generated numbers have not been used before (shuffle)
by oiskuu (Hermit) on Jun 20, 2014 at 20:17 UTC

    Two approaches that might be of interest. In case you require a randomized list of unique numbers, then the problem isn't really about rand() generation, it's a list shuffling task.

    use List::Util q(shuffle); print join q(,), shuffle 1 .. 1000;
    With a large number of elements, various optimizations may become appropriate, such as packed vectors, mmap'ed files, etc.

    The second approach is to use a suitable pseudo-random number generator. One might choose a very simple linear congruential generator with a period of m. E.g. with m==224, a range and period of 16777216 is obtainable. Discard values >= 1e7 and you have a sequence of non-repeating pseudo-random numbers to cover the required 7-digit range. Keep in mind that this list will be a very weak and pseudo random sequence, suitable only in cases where the demands are casual.

      ++ for suggesting a pseudo random generator! :)

      This thread became (as usual) so long b/c the OP doesn't give us much detail, like

      • motivation: why the numbers need to be random
      • quantity: how many of them need to be generated
      • performance: how fast the solution needs to be

      My first guess was to suggest a very simple algebraic Linear congruential generator where each element in the sequence is predictable and collisions are avoided.

      But as usual because of lack of information we try to answer a wide range of possible questions. :(

      Cheers Rolf

      (addicted to the Perl Programming Language)

      > With a large number of elements, various optimizations may become appropriate

      Actually shuffling 1..1e7 takes an eternity, but shuffling 1..1000 for 10000 times only takes seconds and is deterministic depending on the seed of srand .

      So a second rand to chose one of these 10000 arrays and shift one element should be easy, fast, deterministic and produce quite good distributed results.

      The mileage of the OP may vary. :)

      BTW

      The real complication of this whole approach of "try again another random number if already taken" idea, cause this will slow down considerably after a while if millions of random numbers need to be tried out.

      Thats why shuffling is the far better approach.

      Cheers Rolf

      (addicted to the Perl Programming Language)

        BTW

        The real complication of this whole approach of "try again another random number if already taken" idea, cause this will slow down considerably after a while if millions of random numbers need to be tried out.

        Here are the average times, per 1000 part numbers, for the first 9 millions from the 10 million range:

        [1000000]Ave: 0.003626 [2000000]Ave: 0.004084 [3000000]Ave: 0.004650 [4000000]Ave: 0.005443 [5000000]Ave: 0.006492 [6000000]Ave: 0.008238 [7000000]Ave: 0.010892 [8000000]Ave: 0.017433 [9000000]Ave: 0.300273

        So sure, allocating the 9th million takes roughly 100 times as long as the first.

        But that still means that unless the OPs company are going to need to allocate 1 million new part numbers in less than 5 minutes, the "slow down" isn't going to be any kind of a problem.

        And if they are going to allocate such ridiculously high numbers of new part numbers, that the allocation rate is a limiting factor; then they sure as heck need to be starting with a much larger range, otherwise they will run out in their first hour of trading.

        You've yet to learn the difference between knowledge and expertise.

        The knowledgeable are aware of theories, concepts and formulae.

        The experienced know when to apply them.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Check randomly generated numbers have not been used before
by BrowserUk (Patriarch) on Jun 20, 2014 at 21:20 UTC

    Presumably you want to be able to generate unique part numbers across many runs of the program and thus need to keep a tally of those already issued. (Issuing part numbers at random is a bit weird, why not just allocate the next one in order? But c'est la vie.)

    This will generate N unique part numbers each run without duplicates quite efficiently. It uses a singlefile, just over a megabyte in size, for the db:

    #! perl -slw use strict; use constant TALLY => 'tally.bin'; our $N //= 50; my $tally = chr(0) x (10e6 / 8 ); if( -e TALLY ) { open I, '<:raw', TALLY or die $!; my $tally = do{ local $/; <I> }; close I; } sub genPartNo {{ my $partNo = int( 1e6 + rand( 9e6 ) ); redo if vec( $tally, $partNo, 1 ); vec( $tally, $partNo, 1 ) = 1; return $partNo; }} print genPartNo for 1 .. $N; open O, '>:raw', TALLY or die $!; printf O "%s", $tally; close O;

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      In that case, just store the srand seed and recreate the bitmap as needed.

        just store the srand seed and recreate the bitmap as needed.

        It takes 25 milliseconds to write the (10 million bit) bitmap out and then read it back.

        It takes over 20 85 207 minutes to recreate the first million picks.

        Math::Random::MT is excellent; but it's not fast.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Check randomly generated numbers have not been used before
by InfiniteSilence (Curate) on Jun 20, 2014 at 13:44 UTC
    perl -e 'for (0..100){$randhash{int (rand() * 100)} += 1}; print join +qq|\n|,sort {$a<=>$b} keys %randhash;'

    Check elements that have > 1 in the value and recompute until you find a key that does fails exists.

    Celebrate Intellectual Diversity

Re: Check randomly generated numbers have not been used before
by davido (Cardinal) on Jun 20, 2014 at 15:58 UTC

    What problem are you solving with this stream of unique random numbers? (I'm genuinely interested, and the answer to that question could impact the answer I would give.)


    Dave

Re: Check randomly generated numbers have not been used before
by kzwix (Sexton) on Jun 20, 2014 at 14:31 UTC

    Hello,

    Are you supposed to read the previously generated numbers list from the file each time ? Or are you allowed to cache these results ?

    I mean, if you have to read 300.000 numbers from a text file, each and every time you run your script, then generate a number, check that it's not already generated, loop until you have an "original" number, then write down all the previously generated numbers AND the new, original one, yes, it would be costly.

    However, you may use a hash to store the generated numbers. One hash is enough, with the numbers being the keys (you don't even need anything as the values, 1 or undef will work just as well). This should make the search for previous occurences happen in constant time (and I mean FAST).

    You might index the previously generated numbers in a "clever" way, maybe bitwise, in your TXT, so that you do NOT have to read a whole, huge, file each time (you may store 8 bits per byte, each of them could represent one of the combinations).

    You could sort your file, but that would only be relevant if you had to search through it for each number - which you really shouldn't do, whether you generate only ONE new number per run, or a whole batch of them, as you might have to reroll each of them based on previously generated numbers, you might as well put them all in memory.

    As for writing your file, you really should append the new result to the end of it, without rewriting previous entries, unless you wish to keep it sorted.

Re: Check randomly generated numbers have not been used before
by GotToBTru (Prior) on Sep 16, 2016 at 16:12 UTC

    Two years late .. but possibly relevant to the next guy with this question.

    I was given the job of generating random part numbers for the current products in inventory. Any new products that would be introduced would also need a part number, and of course these needed to be values not yet used. I chose a pseudo-random number generator algorithm P[n+1] = (P[n] + offset) % S where P is the list of part numbers, S is size of part number space, and offset is the first prime number > S/2. Run S times, P[1..S] will contain every number between 0 and S-1 exactly once, in what appears to be a random order. If you stop the process before reaching S, just remember where you stop and start there when you need the next part number. In this application, it was only necessary that the numbers appear random.

    But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

      This appears to be a Linear congruential generator for which the multiplier parameter a = 1. I'm far and far from being a PRNG expert (I'd bet BrowserUk could write a lengthy article on this topic before his first mug of caffeine-delivery-beverage-of-choice in the morning), but I think you need to exercise some caution in chosing the m a c parameters so that the period of the generator is maximized (i.e., equal to m) and the generator never gets "stuck" at a particular value of Xn. See the linked article. Just glancing at this article, I don't see any case in the "Parameters in common use" table of a = 1, so this may be a red flag.

      Update: After more thought (and looking at pryrt's post), a multiplier a = 1 does seem to meet all the requirements for maximum period length for this type of PRNG, but gives a boring "clock" progression to the output, i.e., the output always advances by the offset mod S. Don't you want the output to bounce around a bit more and at least seem more random? ;)


      Give a man a fish:  <%-{-{-{-<

        In Xn+1 = (a*Xn + c) (mod m): If the offset c is relatively prime with m, you will generate a maximal-length sequence. Since GotToBTru picked the first prime ≥ S/2, it will be maximal with length S, and will guarantee that's there are at most 2 numbers from each period (ie, Xn+1 may not have wrapped around S, but Xn+2 definitely will have).

        You're exactly right about this. I knew this sort of thing had an official title, just didn't know what it was. I've seen this sort of algorithm called a "random number generator", but it is nearly the opposite. Having once issued a value in the space, there is 0% chance it will re-issue it. Well, until you start through the list a second time, in which case it will issue the very same list of numbers in exactly the same order. The "random" function in the BASIC interpreter I used 40+ years ago used something very much like this, and my first attempts at computer games were very disappointing until I figured that out ;)

        But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

      Whilst the coprimality of parameters ensure uniqueness whilst saving you from have to track previous picks; the method has a weakness: if an attacker can cause you to give him two consecutive picks -- say, by inspecting the latest additions to the parts inventory -- he can determine S, and from that start guessing your new part numbers.

      He doesn't even have to find to consecutive picks; only two that where allocated close to each other. If the difference between them is prime, he's probably found S; if not, he only need find the prime factors of that difference to find it.

      With a little more inspection and analysis, he will be able to determine your offset; and at that point, the benefits of allocating "seemingly random" part numbers is defeated.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Depends on what benefit of "seemingly random" you're looking for. Remember, these are part numbers, used in an inventory system, not part of any cryptographic system. They would function as database keys, so they had to be unique. It was desired that the numbers not be sequential, and that they would span the space. Probably to prevent anyone programming forms or reports from making assumptions about how many digits to allow for part numbers that would cause trouble at a later time. That was my guess at the time.

        I put this together in 1993. My algorithm could still be in use to this day.

        But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1090630]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-03-29 06:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found