Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: How fast is fast?

by schwern (Scribe)
on Aug 06, 2011 at 23:37 UTC ( #919006=note: print w/replies, xml ) Need Help??

in reply to How fast is fast?

Let us consider caching. The front page of Google is 300k... but 270k of that is cached. 30k a hit. Getting a search result is about 130k, 80k of which is cached. In addition, with a site that large much of your requests will be AJAX calls returning small bits of JSON and XML. We're talking an order of magnitude below the 200k estimate.

Now consider the Twitter problem. You need to efficiently pipe 140 characters to just the followers of each user in real time and you have millions of users constantly sending messages. The problem seems simple, and the payload is small, but it is an extremely expensive calculation. Social networks at the scale you're discussing are CPU intensive.

This brings us to the problem with your calculations: they're wrong. They're wrong because they are premature optimization. Or in this case, premature sub-optimization. The evil of premature optimization is not so much the optimization, it's thinking you can predict performance. It's thinking you can predict the future. It's thinking you know everything you need to know about how a system will be used and react before it actually happens. In any sufficiently complex system the best performance prediction is this: your prediction is wrong. You simply do not have the data. I don't either. Most of us do not.

A site with a million hits a day doesn't just appear out of thin air. Nobody should sit down and try to design a site that big unless they already have a slightly smaller one doing the same thing. That's the only way to get real experience and data to plug into the equations to find the real bottlenecks. Nobody should start by buying the sort of hardware you're talking about. You should start by knowing you will be wrong, plan accordingly, gather metrics and optimize on that. Be modest in your performance expectations until you have something to profile.

You can think of it like the Drake Equation. As a thought experiment and discussion piece about the probability of alien life, it's fantastically focusing. As a practical predictor it's meaningless. Most of the numbers we plug in are sheer speculation. There's so many variables with such high variability being multiplied that the end result swings by orders of magnitude based on what valid seeming estimates you plug in. Errors multiply. You can get anything from millions to just 1, and you'll tweak the results to your liking (nothing personal, just human nature). It's seductive to think that meaningless number is proof to take action.

Replies are listed 'Best First'.
Re^2: How fast is fast?
by Anonymous Monk on Aug 07, 2011 at 08:04 UTC

      Thanks for the heads up, I will not feed the energy beast.

      I learned some things researching the answer. Judging from his replies, he has learned nothing.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re^2: How fast is fast?
by Logicus on Aug 07, 2011 at 01:14 UTC
    Nobody should start by buying the sort of hardware you're talking about.

    In a few years time it will be available on the budget end of hosting but not everyone thinks that $279/mo is a big budget for anything much really.

    I mean have you seen this :

    Get ready for the number of available cores doubling every 18 months and the big players finding a way to keep it scaleable.

      Relying upon hardware performance is never a substitute for using the best algorithm. To do so is folly; but when you know nothing of algorithms, it makes some kind of sense to appeal to hardware.

        We used to draw polygons line by line using fixed point maths to avoid the overhead of using floating point maths, prior to the advent of the maths co-processor or FPU.

        These days we direct powerful GPU hardware to draw them for us. There is no way that the algorithms used today could possibly run properly on machines with no GPU, regardless of how well you optimise them. They are far too complex and involve far too much data manipulation.

        The same hardware which accelerates graphics rendering is also usable for regex's. And suddenly with that additional piece of hardware being used, regex's are no longer the slow/stupid way of doing things, but infact the smart way of doing things because you have like 512 processors and several gigs of high-speed ram on your side running those regex's at lightening speed behind the scenes.

        The more of that work you can shift away from the CPU, the better!

        Ps. is it regexes, regex's or regexen? I dunno...

        P.P.s I really hope Perl6 will be the first language to fully exploit that fact!

          A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://919006]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2022-01-25 15:24 GMT
Find Nodes?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:

    Results (66 votes). Check out past polls.