http://qs321.pair.com?node_id=919006


in reply to How fast is fast?

Let us consider caching. The front page of Google is 300k... but 270k of that is cached. 30k a hit. Getting a search result is about 130k, 80k of which is cached. In addition, with a site that large much of your requests will be AJAX calls returning small bits of JSON and XML. We're talking an order of magnitude below the 200k estimate.

Now consider the Twitter problem. You need to efficiently pipe 140 characters to just the followers of each user in real time and you have millions of users constantly sending messages. The problem seems simple, and the payload is small, but it is an extremely expensive calculation. Social networks at the scale you're discussing are CPU intensive.

This brings us to the problem with your calculations: they're wrong. They're wrong because they are premature optimization. Or in this case, premature sub-optimization. The evil of premature optimization is not so much the optimization, it's thinking you can predict performance. It's thinking you can predict the future. It's thinking you know everything you need to know about how a system will be used and react before it actually happens. In any sufficiently complex system the best performance prediction is this: your prediction is wrong. You simply do not have the data. I don't either. Most of us do not.

A site with a million hits a day doesn't just appear out of thin air. Nobody should sit down and try to design a site that big unless they already have a slightly smaller one doing the same thing. That's the only way to get real experience and data to plug into the equations to find the real bottlenecks. Nobody should start by buying the sort of hardware you're talking about. You should start by knowing you will be wrong, plan accordingly, gather metrics and optimize on that. Be modest in your performance expectations until you have something to profile.

You can think of it like the Drake Equation. As a thought experiment and discussion piece about the probability of alien life, it's fantastically focusing. As a practical predictor it's meaningless. Most of the numbers we plug in are sheer speculation. There's so many variables with such high variability being multiplied that the end result swings by orders of magnitude based on what valid seeming estimates you plug in. Errors multiply. You can get anything from millions to just 1, and you'll tweak the results to your liking (nothing personal, just human nature). It's seductive to think that meaningless number is proof to take action.