Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Duplicate Randoms with SSI

by gwhite (Friar)
on Oct 26, 2007 at 14:15 UTC ( [id://647388]=perlquestion: print w/replies, xml ) Need Help??

gwhite has asked for the wisdom of the Perl Monks concerning the following question:

I have a website that uses a Perl script via SSI to add random html blocks on my pages, sometimes it is information sometimes it is advertising, and I have 2-5 of these blocks on each page. The content (html strings) are in flat text files. What happens more than I would like is I end up with the same block 2 and sometimes 3 times on a page. Since each block is a SSI call and it grabs a random line from the text file, I have not been able to figure out a way to determine if the block is already being shown on the current page and if it is another block should be grabbed. Does anyone have any clever suggestions to solve this problem? I serve about 100,000 pages a day to 8 to 10 thousand users, so processing time/disk writes and such needs to be kept in mind.

g_White

Replies are listed 'Best First'.
Re: Duplicate Randoms with SSI
by perrin (Chancellor) on Oct 26, 2007 at 16:23 UTC
    Here's an idea. First, use an algorithm rather than a random selection. Take some things like the current time and the browser's IP address and apply something like a hashing algorithm to generate a number from them. Use that number as an index into your file, wrapping as necessary. Then, add offsets to the SSI calls, e.g. /my/cgi.pl?offset=1, /my/cgi.pl?offset=2, etc. In each call, add this offset number to your computed index. If showing the ads in sequence is a problem, multiply the offset by something, but keep it small enough to avoid wrapping around and possibly showing the first one again.
Re: Duplicate Randoms with SSI
by amarquis (Curate) on Oct 26, 2007 at 14:28 UTC

    I cannot think of an elegant solution to your question, but have you considered dropping the SSI and just going with a full Perl solution? You can add a handler so that the server will know to call the perl interpreter on your .html files, so you don't have to change the urls or anything (There are other ways to get around this as well). It seems easier to have one script do everything rather than putting together some method of saving state information between SSI calls.

      I have considered it, but part of the content is legacy Perl applications, a couple of purchased PHP apps, and is your typical 10 year old site a little of this, some of that. It is far from elegant and the number effort gnomes required to convert the entire site to Perl is way beyond the number I have.

      g_White
Re: Duplicate Randoms with SSI
by Rhandom (Curate) on Oct 26, 2007 at 16:24 UTC
    Each of those SSI calls will be forking off another process and will have no way to communicate up to the parent process that something has happened.
    You do have something else that could potentially work. It has been long enough since I have used an SSI that I don't remember if REMOTE_ADDR, HTTP_REFERER, or REMOTE_USER are set inside of an SSI process - I think that they should be (REMOTE_USER would only be set in an htauthed area). As long as even one of those are set, you can use a solution similar to the following. It isn't 100% accurate - but it should be good enough:
    use Cache::Memcached; use Digest::MD5 qw(md5_hex); sub get_random { my ($pick_list, $unique_key) = @_; # add uniqueness to our key $unique_key .= $ENV{'REMOTE_USER'} || $ENV{'REMOTE_ADDR'} # less reliable || $ENV{'HTTP_REFERER'} # least reliable || ''; # based on time now $unique_key = md5_hex($unique_key); # this solution requires a memcache server # or some other cache that handles volitility my $mem = Cache::Memcached->new({servers => ['localhost:11211']}) +; # see what has already been used my $used = $mem->get($unique_key) || []; $used = [] if @$used >= @$pick_list; # reset when full my %used = map {$_ => 1} @$used; my @avail = grep {! $used{$_}} 0 .. $#$pick_list; # pick random item and add it to list of used items my $index = $avail[rand @avail]; push @$used, $index; $mem->set($unique_key, $used); return $pick_list->[$index]; } # use it like this my @items = ("http://foo", "http://bar", "http://baz"); my $page = 'pagefoo'; print get_random(\@items, $page), "\n"; print get_random(\@items, $page), "\n"; print get_random(\@items, $page), "\n"; # it will automatically reset print "Reset\n"; print get_random(\@items, $page), "\n"; print get_random(\@items, $page), "\n"; print get_random(\@items, $page), "\n"; __END__ Prints something like: http://bar http://baz http://foo Reset http://baz http://foo http://bar
    You should note that I have used memcached here. My reasoning for doing it is you can have a small localized chunk of memory allocated for this very temporary, very dynamic system and you can insert entries into memcache and then forget about them. The old entries and unused entries are automatically dropped as new entries use up the available memcache space. I would say that for the use you have mentioned here, having a memcache daemon running with only 1MB of allocation would be sufficient for what you are doing.

    Oh - and this solution should add very little overhead to your process.
    my @a=qw(random brilliant braindead); print $a[rand(@a)];
Re: Duplicate Randoms with SSI
by dwm042 (Priest) on Oct 26, 2007 at 17:00 UTC
    Truly random numbers should contain periods with repeats. Therefore, you don't want truly random numbers. You want a nonrandom function with a period as long as the number of ads you serve, whose values map 1 to 1 to your ads for the length of the period. The piece of code that generates this function has to be persistent.

    By way of example, if you have 10 ads, you could start with an array of numbers, 1 through 10 which you would then shuffle. Each number would map to an ad, and then use your 'persistent' piece to hand the required content to the display code. When the server piece gets to the end of the array it starts over. After serving up so many ads, it reshuffles the array of ad-order (so that people don't see too common a pattern).

    Your persistent piece could be a separate process, using sockets or a fifo for communication or simply a file that is read that contains an index. You would increment and update the index after each use.

      I have considered having sets of numbers of blocks to pull, I was going to index them on the current seconds (instead of IPs and ENV variables as previously suggested). I tend to add and delete blocks from daily to at least a couple times a month and the number of blocks that are in the text file can inflate or deflate by 15-20 blocks per change. Having to constantly update my chain of numbers seems to defeat the reason to have SSI

      g_White
Re: Duplicate Randoms with SSI
by Krambambuli (Curate) on Oct 26, 2007 at 15:50 UTC
    I'm not sure if I really understand how things are working, so maybe my idea might be totally wrong.

    I'm thinking about introducing (text, md5_checksum) pairs into in the equation. Giving the Perl script the already existent md5_checksums as arguments and getting not only the text, but also the md5_checksum for it as result.

    Calculating md5 checksums isn't cheap, but also not really expensive - so, maybe it might work ?

    Krambambuli
    ---
    enjoying Mark Jason Dominus' Higher-Order Perl
Re: Duplicate Randoms with SSI
by duff (Parson) on Oct 26, 2007 at 18:28 UTC

    Crazy idea: Make an image server daemon and make your SSI calls to a light-weight client that just requests an image from the daemon. The daemon would keep track of the last 5 (for instance) images that it just served and know not to serve them again.

      Not sure it is crazy, I have thought about implementing this with mod_perl so it is constantly running. With the quantity of visitors I get, I would need to keep a list of IP addresses or session IDs or something to keep track of what went to each user, I think what you are suggesting is not too far off this.

      g_White
Re: Duplicate Randoms with SSI
by Aim9b (Monk) on Oct 26, 2007 at 17:46 UTC
    Would it be posible to determine ahead of time how many ads you need to display, then retrieve them all on a single access. They wouldn't need to necessarily be in sequence. Just a thought.

      I have considered this, one call that would create the maximum number of blocks each as its own -div- then placing the -div- blocks. I don't know if I run into CSS issues or not, since a left column block needs to be formatted differently than a right column or center column block. I may have to ask a CSS whiz on how I might it get it done correctly.

      g_White
Re: Duplicate Randoms with SSI
by jethro (Monsignor) on Oct 27, 2007 at 22:34 UTC
    Lowest-tech solution:
    shuffle the ads into 5 directories. Get the first block on the page out of the first dir and so on.

    Have a cron-job either rename the dirs in a round-robin fashion or shuffle the adds between the dirs every 15 minutes. This makes sure that in the long run all the ads have equal chance to show up.

    BUT: Both would need a locking mechanism, so that the SSL is not accessing at the moment of the renaming or shuffling.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://647388]
Approved by Corion
Front-paged by Rhandom
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2024-03-28 21:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found