Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Can your site handle this?

by Logicus (Initiate)
on Nov 05, 2011 at 05:45 UTC ( [id://936105]=perlmeditation: print w/replies, xml ) Need Help??

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: Can your site handle this?
by Anonymous Monk on Nov 05, 2011 at 06:58 UTC

        ... including my obfu: Nervous sub. (Unattributed plagiarized copy posted here by Logicus.)

        What did I do to deserve having something that took me a couple of hours to write stolen and posted on someone else's site without so much as an attribution? Even something such as, "This was invented by davido over on PerlMonks." would have been courteous.

        Instead, it's made to look as though someone else wrote it and posted it there.

        It reminds me of Lee Marvin's line from the movie "Emperor of the North". "Kid, ya got no class!"


        Dave

        A reply falls below the community's threshold of quality. You may see it by logging in.
        Yup, and that's just the start! I'm going to rip content off from near and far! All your internetz are belong to me!

      It had to be Logicus. His site says, "Logicus has written this wonderful bit of Perl Poetry"

      Almost verbatim plagerism I'm afraid... Your going to see a lot more of the content on perlmonks, (and other sites) turn up on perlnights as well, at least until it has it's own traffic generating content without being helped along.
Re: Can your site handle this?
by Anonymous Monk on Nov 05, 2011 at 07:07 UTC

    Miyagawa is an absolute genius, and you would be doing yourself a huge favour to get to know his work inside and out. Sure he's "reinvented" a wheel, but WOW his new wheel is frak-frikken-tastic

    FWIW, Miyagawa made himself a vehicle factory, not a reinvented a-wheel

    Trying to attach yourself to Miyagawa, in hopes, the glory and usefulness of his software will reflect onto yours, is just sad

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Can your site handle this?
by AnomalousMonk (Archbishop) on Nov 05, 2011 at 16:18 UTC
    ... DO NOT USE THIS ...

    Just as a matter of curiosity: To what does 'THIS' refer? aXML? Miyagawa's PSGI/Plack? The specific code example in the OP? Other?

    Update: I suppose I should have said "a matter of morbid curiosity" for now I fear I have exposed myself to one of Logicus's trademark million-word, 15-level replies/rants/fulminations/graphomaniac episodes. And entirely by my own doing!

      The code in the example, it's fine to test your own stuff with, but if you were to run that against someone else's domain you could get yourself in a whole heck of a lot of trouble, and potentially face up to 10 years in prison.

        this is about 75% curiousity and (I hope) not more than 10% sniping
        but...

        This reply conveys very little which wasn't in the OP; certainly, nothing of substance about why one should expect "a whole heck of a lot of trouble, and potentially face up to 10 years in prison"?

        Yes, running that many workers against a single URL would be impolite in the extreme and could, perhaps, be construed as a DOS attack, but is that your reason? And if so, under the laws of which country?

Re: Can your site handle this?
by Your Mother (Archbishop) on Nov 05, 2011 at 15:19 UTC

    FTR, one should prefer ab for this sort of testing; or perhaps LWP::Parallel/LWP::ParallelUA and friends, or threads/Coro/AE if the responses need to be handled in some deeper way in the testing.

      Do you have an example bit of code I can look at?

        ab is a command line tool. I think this approximates the Perl in the OP–

        ab -c 128 -n 128000 http://127.0.0.1:5000/
Re: Can your site handle this?
by Sewi (Friar) on Nov 05, 2011 at 21:48 UTC

    Don't post DoS source in public.

    People understanding the problem usually are able to write this little script themselfs, but now everybody trying to abuse the internet could simply copy your sample and (try to) take down any site.

    I don't think that this source would do the job either, because 128 tasks would run on the client and few client computers are able to handle this. LWP::UserAgent::Parallel might do a better job here.
    Finally Apache and most other webservers are able to queue some connection requests and thus handle more incoming connections than workers.
    If the starting page isn't too much source, the webserver won't even go into a DoS state, because one request will be processed before the client is able to give enough CPU time to another task to send the next request.

      Don't post DoS source in public.

      pfft, even the most lame ass script kiddie can download far better dossing tools than this from a plethora of locations.

      The script is more about stress testing than dossing tbh, because very quickly you run into your own pipe's bandwidth limitations anyway, and even a tiny little $20 a month server which is coded properly can handle the worst anyone could throw using a script like this.

      A Quad Core Phenom II can easily handle running 128 workers, and when you remove the bandwidth bottleneck by running this script against localhost, the result is 100% CPU utilisation for quite a short period of time before the test is complete. (a matter of seconds rather than minutes)

      The point is seeing how well the server copes with large amounts of concurrent requests which an inefficient system would struggle with because it would exceed the available memory and start to grind around in virtual memory before crashing.

      That's why on modern hardware with multi-core multi-ghz processors, CPU utilisation is less important these days than memory utilisation, and older code which is optimised for slow single core CPU's by using large amounts of caching and so forth is less efficient on modern hardware than code which relies on using larger amounts of CPU whilst using memory sparingly.

      The computer industry is going through a change in the way it works and thinks because of the relentless increase in processor power. The game keeps changing and progress marches forwards without stopping. Anyone who thinks otherwise is a fool to be ignored.

Re: Can your site handle this?
by cavac (Parson) on Nov 06, 2011 at 00:16 UTC
    Can your site handle this?

    Yes it can. Quite nicely in fact, running here locally on my laptop.

    My one publicly visible (private) webserver... not so much. But that's not actually a software problem: My webserver, running in VirtualBox on an ancient hardware has just 150MB or so of free RAM without any connections and starts swapping - everything goes down from there. Replacement hardware is on its way, though.

    Not that it matters of course, my wikicables database gets something like one visitor per day.

    All my important services run on real hardware with enough power to go to 1024+ threads without too much trouble; just takes like a second or so longer to display a dynamic page. Although i must admit it gets quite noisy when the whole kit and caboodle is spinning up its fans and grinding its database harddisks - until the automatic network defense kicks in and blacklists the attacker.

    It works without all that fancy Plack and aXML stuff, just plain old HTTP::Server::Simple::CGI::PreFork with a bit of Maplat magic thrown in. Of course, using a good database (PostgreSQL), sane database layout and memcached also helps a lot ;-)

    Don't use '#ff0000':
    use Acme::AutoColor; my $redcolor = RED();
    All colors subject to change without notice.

      Oh come on stress testing against Localhost is entirely unrealistic. For a start you've removed all network I/O limitations, and for seconds unless your renting a beefy server your likely to have far more available RAM on a home system, especially these days.

      You mention swapping, and your quite right, the moment your server starts grinding in virtual memory it's game over.

      Any large scale site these days runs on servers with dozens or hundreds of processors and many gigabytes or even terabytes of ram.

      When your running out of ram it doesn't matter how clever your algorithm is and how efficient it is, it's still going to grind to a horrible slow halt when the memory is maxed out.

      That's why I argue that memory usage is far more important these days in building scalable solutions than processing power is, where the same was not true just a few years ago when most of the solutions we have available (the ones which people religiously worship as being beyond question the only solutions worth using) were written.

      I read somewhere, and I'm sorry I can't remember where to give citation, that HTTP::Engine, uses 15mb of ram per child process to do what it does, and that probably made perfect sense in 2006... It's designed for machines where memory usage was less important than processor speed, and that's why when the hardware changes specification new solutions are needed.

      That in itself is proof of the validity of reinventing wheels when needed, because the road itself has changed.

      Perhaps in the future CPU speed increases will falter whilst memory chip density will accelerate, then the situation shifts again, but in the meantime, unless you have a big-iron server budget your server is far more likely to run at peak efficiency using software which uses more processor power but only consumes 2mb of ram per child process rather than 15mb of ram per child process and leaves the CPU idling for most of its cycles.

      This is also a generic argument for the comeback of dynamic languages in general, since they are always more processor intensive than their fully pre-compiled counterparts, a fact that simply doesn't matter here in late 2011.

        Oh come on stress testing against Localhost is entirely unrealistic. For a start you've removed all network I/O limitations,

        You are quite right for the most part. First of all, my DSL modem would go up in flames before even a cheap rent-a-server service would notice any relevant increase in traffic. So, for real-life testing you need a decent internet connection for a start. But then, yes, testing makes a lot more sense.

        But testing against localhost is also a good idea. You might notice race conditions and things like that a lot easier. At least, i did.

        and for seconds unless your renting a beefy server your likely to have far more available RAM on a home system, especially these days.

        When doing production critical services, i usually put my own servers in a colocation. While it's certainly more work, it usually pays (for me) in the long run. Especially when upgrade time comes around. But since i do (mostly) in-house stuff, your situation is probably completly different from mine...

        Don't use '#ff0000':
        use Acme::AutoColor; my $redcolor = RED();
        All colors subject to change without notice.
Re: Can your site handle this?
by TJPride (Pilgrim) on Nov 23, 2011 at 11:53 UTC
    I also agree that the main limitation is not processing power, but RAM. Unless your site is hideously badly coded, processing requirements are going to be relatively minimal, but RAM on the server-side for PHP is going to be 8 MB minimum per concurrent session, perhaps a good deal more if your page is also accessing a mySQL database. 8 * 128 = 1024 MB of RAM, and the cheapest virtual dedicated servers are still limited to 1024 or 2048 MB, with up to several hundred MB often taken up by overhead (depending on how well they're set up). So you could hit the limit. I would guess the same is true of Perl.

    Short version - if you use a virtual server, you probably want the 2 GB and not 1 GB. (and yes, I do use PHP in addition to Perl - which I use for a site is dependent on what the site needs to do)

      My system uses 2MB of ram per child process.
Re: Can your site handle this?
by sundialsvc4 (Abbot) on Nov 09, 2011 at 17:07 UTC

    With absolutely no intent to make any judgment whatsoever with regard to Logicus, the original author of this post, I would nonetheless like to point out a couple of “common misconceptions and flaws” that often do crop up in deployed designs ... things which are, in fact, clearly demonstrated in and by this post.   Again, using it only as a conveniently-presented example, let me briefly step up onto my soap box.

    The first:

    The above code spawns 128 workers, which then download pages back to back as fast as they can ... [...] ... system would run into troubles with 8 workers, and pretty much grind to a halt under the load of 16 workers.

    Many designers assume that “the more threads, the merrier.”   Surely 128 threads can do twice as much work in the same amount of time a 64 threads could, and so on.   But the reality is that the actual capacity of any system is, in this case, limited both by space and by time.   Each thread obviously consumes some amount of resources, some of which can be shared among all instances but many of which cannot.   If the most-scarce resource (memory) becomes strained, thrashing occurs, and this results in an exponential loss of performance not-so-affectionately known as “hitting the wall.”   Examination of this particular system might show that, say, a worker-count setting of 12 might be optimal, and so this machine would be configured to launch no more than that number of workers, who would process one request after another (without dying in-between...) for weeks or months continuously.   The worst-case throughput of this system could easily be calculated, and it would remain fairly constant.

    The second:   (Same example.)

    Many web servers, either through “simple CGI” or some other means, fire a new flaming-arrow into the air with each request.   Such servers can easily be forced into a denial-of-service state simply by dumping an excessive number of requests on them.   The denial of service that can be achieved can be quite devastating if it forces the system into unconstrained thrashing such that console operators find it difficult or impossible to issue commands to stop the flood.   (Moral:   “if you fire flaming arrows into the air, you’re just going to burn down the entire town, and this without even one single tasty marshmallow being successfully toasted.”)

    The actual expected performance curve of any system, for any application mix, can be plotted on a graph.   The curve of that graph will always be that of an “elbow,” with a gently sloping linear curve followed quite abruptly by an exponentially-ascending “the wall,™” which in this case emphatically is not an inspired rock-n-roll album.   Like any good self-powered machine, a request-processing system must be designed with queues and governors.   It must have the means to regulate its own performance and to adjust its own control parameters in real-time in response to changing conditions.   It must of course endeavor to process every request in a predictable manner, but in truly extenuating circumstances it must be capable of selective load-shedding, as well.

    It is a very common objective of a web site that the web site can be used to make work-requests.   But too-many web sites take the approach that they are not only “the user interface,” but “the worker” as well.   To see the folly in this approach, one merely has to look at a real-world example of a request processing system that works well:   a fast-food restaurant (as originally designed by Ray Kroc of McDonald’s fame).   The person who says, “may I take your order, please?” does nothing else.   “The fry guy,” “the burger-meister,” and the fellow who mops the floors don’t take orders from anyone (other than “the big cheese”).   There is a strict separation of roles and responsibilities, a well-designed work flow, and a workflow management system (now computerized, but at one time based on paper tickets and that little rotating carousel onto which the tickets could be stuck).   While I am not advocating that you should ever actually eat at one of those establishments, you can get a lot of good examples from them.

    Last but not least:   “remember CPAN.”   There are plenty of workload-management engines out there, in various stages of sophistication and completion.   As is always the case with Perl, you don’t have to start from scratch on your new project.

Re: Can your site handle this?
by sundialsvc4 (Abbot) on Nov 07, 2011 at 21:41 UTC

    Well, interestingly enough, the answer should always be “yes.”

    You can never predict from moment to moment what workload might be presented to your servers ... whether that load is coming from 128 processes who are beating themselves senseless trying to persuade their host-server to thrash itself to death, or by 128 excited would-be customers who are trying to give you their credit-card numbers.   (Example:   ticketmaster.com during the first ten minutes of sales for the next Rush concert.)

    As with every possible workload that a computer system could be presented with for any reason at all, the receiving system must be the one to parcel out its own capacity, such that it never permits itself to be overloaded no matter what the instantaneous request-pattern may be.   (“The rest of you will have to wait in line outside in order to get your turn to enter the building ... and if some of you drive away, c'est la guerre.”)

    To say that “Plack is the answer,” though, is really not a complete statement.   Plack is only part of a potential answer.   A true high-load system must not only be able to regulate the number of users (IP addresses) who are permitted to “visit a web page,” but it must also separately control how many units-of-work can be generated for subsequent processing by the back-end systems.   It must maintain effective load-control and load-balancing, not only for the “user interface” (the web-pages), but also for the actual pay-load (e.g. ticket selection, sales, and disbursement).   There are multiple overlapping workload-servicing queues in play here, and the entire design of the system must be thoroughly considered.

    It is interesting also to note that the underlying technology of Plack can be useful in many ways when designing a system like that:   it’s a pretty darned good “glue,” not only for connecting an HTTP server to a page-delivery engine, but also (separately, of course ...) for plumbing a page-delivery engine to a back-end server system.   It is a well-thought out and well-tested delivery manager, provided that the payload you are exchanging can be caused to conform to the data-format which Plack uses.   (Easily done.)

    A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://936105]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-04-26 00:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found