Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Are you coding websites for massive scaleability?

by jdrago_999 (Hermit)
on Oct 29, 2008 at 19:12 UTC ( [id://720306]=perlquestion: print w/replies, xml ) Need Help??

jdrago_999 has asked for the wisdom of the Perl Monks concerning the following question:

Monks -

Are you coding your websites for massive scalability?

Things like load-balancing, reverse-proxies, database partitioning/clustering/replication/etc, SAN, etc?

If so, what web development framework are you using? Have you *actually* scaled your project out across more than one server?

Have you *actually* written unit tests for the web portion of your application? Did you use WWW::Mechanize or something else?

The reason I ask is that aside from the daily barrage of 101-level questions, I don't see much in the way of these real-world problems coming up here. Flame away if you think that's flamebait.

UPDATE:
Added the FCGI::* options.

  • Comment on Are you coding websites for massive scaleability?

Replies are listed 'Best First'.
Re: Are you coding websites for massive scaleability?
by perrin (Chancellor) on Oct 29, 2008 at 19:54 UTC

    I've written large web apps that scaled across clusters of machines using mod_perl handlers, CGI::Application, and HTML::Mason. I've written many unit tests for the web UI with Mechanize. My co-workers have written some with Selenium to test JS code. I've given some talks on these subjects at OSCON over the years. What are you trying to figure out?

    One thing I can tell you is that in terms of scaling, the web framework you choose is probably your least important decision. What actually matters is I/O, so it's all about databases and caching tools.

Re: Are you coding websites for massive scaleability?
by Old_Gray_Bear (Bishop) on Oct 29, 2008 at 20:01 UTC
    Short answer -- Yes. It doesn't matter which question you asked, the answer is yes.

    I work for a 'small-ish' Web 2.0 company. My particular application is implemented across a cloud of around 1500 servers (this quarter; more next year, I don't know how many yet -- budgeting is going on now). The Company has other properties (projects) that range in size from two servers to over 10,000, scattered across six continents. (We have clients at McMurdo, just no servers there. Although, that's a thought -- a co-location facility in Antarctica wouldn't have a cooling problem, right? Just pull in the outside air....)

    Yes, we do believe in Internet Scale. Any particular web technology from the past fifteen years has been used here, and quite possibly is still in use somewhere in the Production Cloud.

    I can't speak for the rest of the Company, but we have been developing a test-bed for our application that uses a combination of WWW:Mechanize, Selenium, Load-Runner (for the few MS-Windows servers), and a lot of home grown code to create a testing/regression-test framework for the Application. This project has been going on for the last three years, and we have around 40% coverage. (That's up from <1% three years ago.)

    I suspect the reason you see a dearth of 'real-world problems' on the PM is that by the time a company gets large enough to have to worry about the scaling issue, one of two things happen -- Either they bring talent on board with the experience to sort the problem; or they get acquired by the Google's, Yahoo's, Cisco's, or Microsoft's of the world. These folks have had to solve the Scaling Problem a long time ago (in Internet years), and continue to solve it on a daily basis.

    ----
    I Go Back to Sleep, Now.

    OGB

Re: Are you coding websites for massive scaleability?
by dragonchild (Archbishop) on Oct 30, 2008 at 01:05 UTC
    perrin, as is usual with webapp questions, is right on the money. The web framework you use, if used properly, has almost zero impact on your ability to scale with the number of requests. The limiting factor, in almost all cases, is getting to the right data for the request.

    What the web framework does is affect the ease of development and testing. The right framework can mean up to a 10x difference in development and maintenance speed. It can also mean the difference in the ease of testing. For example, testing is something that Catalyst and RoR have made a big priority in doing easily. This is unlike HTML::Mason or CGI::Application, for instance.

    In terms of testing, the big thing is to make sure that you are at least testing at the request level. So, things like Test::HTML::Content are good things.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      That's a nice tool - I haven't seen it before.
Re: Are you coding websites for massive scaleability?
by mr_mischief (Monsignor) on Oct 29, 2008 at 20:23 UTC
    I have not done much of this recently, because I now run a company that caters to companies and organizations that need small but interactive websites. However, when I was working on sites with over 100k daily users (as part of a team), we used only some of those tools. I'm sure others here have used more of them. I'll assume you're looking at application-level or middleware-level redundancy and not running a large single-OS-image cluster, which is any entirely different kind of beast.

    Load-balancing is one key to scaling, and reverse proxies are another. If you have user accounts, you can scale those accounts out across disparate back-end systems as one form of load balancing. You need to make sure the front-end knows which back-end to use then, and you may need to make sure the user stays on the same front-end proxy for the whole session. Reverse NAT and round-robin are simple forms of load balancing that can be done easily. LRU and least-loaded take a bit more work. Reverse proxies are nice for making sure you're only generating dynamically what you need, and I've used that.

    Database replication is one other non-Perl thing to consider. So are the other services you might need to scale out along with your web site, like SMTP. POP/IMAP, LDAP, or whatever else you use besides HTTP that is important to your web offering or services offered with it.

    As for the Perl side of things, you didn't mention FCGI, CGI::Application::FastCGI, Catalyst::Engine::FastCGI, or any of the other FastCGI modules on CPAN. FastCGI gives you a long-running process with no per-hit startup cost that is separate from the web server software. Apache and IIS can both talk to FastCGI backends, and the backend can even be on a separate physical server from the web server software.

    For more of the Perl side of things, which template system and which DB interface layer (plain DBI, DBIx::Class, Rose::DB::Object, Apache::DBI, or many others) are good questions. Setting up a thin object layer of your own over your DB entries works for some sites, but if you have lots of tables with lots of schema changes, you probably want something more maintenance-friendly. Development practices must scale, too, just like request handling.

    I do indeed write unit tests for most of my web applications, but I'm sorry to say I usually don't hit full coverage. I usually have my code able to be run locally as well as over the web, so most unit tests are relatively easy. LWP::Simple is often enough to unit test things remotely, and sometimes even just a script wrapping lynx will do. More complex things do call for a more complex testing method such as WWW::Mechanize.

    Most of my truly large and scalable web-based applications have actually been done using mod_php, MySQL, OpenLDAP, reverse proxies, and DNS round-robin for load balancing. I have used plain CGI with both PHP and Perl with moderate (10 or 20 separate backend servers) of horizontal scaling both with no reverse proxy and with a group of intelligent reverse proxies which could connect to the proper back-end server. Using mod_perl or FastCGI will let you do more with each backend machine, but you can scale with plain CGI in a pinch if you have the hardware budget and the rack space.

Re: Are you coding websites for massive scaleability?
by JavaFan (Canon) on Oct 29, 2008 at 20:17 UTC
    Are you coding your websites for massive scalability?
    I do, and I've done so. Never alone though, always as part of a team.
    Things like load-balancing, reverse-proxies, database partitioning/clustering/replication/etc, SAN, etc?
    Yes. And except for reverse-proxies, I've done all the things you mention for non-Web related services as well. Wait, if you consider MTUs relaying mail to another internal server as a reverse proxy (which it is, although not for HTTP traffic), I've done reverse proxies as well.
    If so, what web development framework are you using?
    mod_perl, HTML::Mason, Tomcat, Websphere, Silversomething and some HP technology whose name I cannot remember.
    Have you *actually* scaled your project out across more than one server?
    Yes. Scaling web services over more than one physical server is trivial; for more interesting is to scale your database server over more than one.
    Have you *actually* written unit tests for the web portion of your application?
    Yes.
    Did you use WWW::Mechanize or something else?
    Yes, and yes. (X-runner, wget, nagios, lynx -src, LWP, C, IO::Socket::INET, ...)
    The reason I ask is that aside from the daily barrage of 101-level questions, I don't see much in the way of these real-world problems coming up here.
    I'd say most of the 'real-world' problems aren't Perl problems. I certainly wouldn't post non-Perl problems here.
Re: Are you coding websites for massive scaleability?
by karavelov (Monk) on Oct 30, 2008 at 02:19 UTC

    My little bit of experience about perl webapps, infrastructure and scalability.

    I am using a lightweight front web server (nginx but lighttpd is in the same category) that serves the static content and forwards the dynamic request to the backend. Here you have 2 options for the backend: either proxy to another http server (may be apache with mod_perl) or use FastCGI. Fcgi processes could live in the beginning on the same server and later they could be migrated to separate servers.

    My experience (comparing fcgi and mod_perl+apache) is that you could get better performance with fcgi - actually very close in terms of request per second, but with less deviation and less failed requests.

    Another advantage for FCGI is that is easier to administrate - you could move portions of the site from server to server, relaunch just some parts of the site, give different privileges to different parts of the site (think of uploads, writable directories, number of processes, memory limits etc.)

    Now for the perl side. I am using CGI::Application as base framework - it is very lightweight and easy to customize. You could have some problems with certain C::App plugins that do not play well with FCGI. One of my problems was that some of them are thought to create object per request. So I have replaced them with some custom code (that should soon land on CPAN). Now I could create webapp object (with heavy initialization) only once and just pass the request to it. This gives major performance boost (I was unable to do the same with mod_perl+C::App).

    Now for the DB side. I use plain DBI (with some helper functions). The reason for this is that most of the SQL queries that I write could not be expressed easily or could not at all be done though some abstraction layer. Think of this: disable MERGE JOINS in this session, make query, revert default planner behavior. Why choose this route? Because the database engines are coded to manipulate data quite fast - you could not match them in perl - so tell the DB to give you only this data that you need, ordered in the order that you need and in least possible requests.

    Another paragraph about database scaling. One way is to use some king of full replication and common pool for the connections. Another option is database partitioning. If you could partition your data on some relatively autonomous domains you could partition it on some number of different DB servers (replicating only some small portion of the data) - this gives you more scalability than just plain replication. (table partitioning is another story)

    And finally. On some of your questions:

    • Yes, I have participated in development of sites and systems that run on more than one server. Example: IPTV for Bulgaria that peaks @ 10+К concurrent streams, 100+K unique visitors per day.
    • No, I have not written unit tests for it. For some other projects I have written, but not the web portion
    • My personal opinion is that it is better to plan early for scalability because later it is hard or impossible to add. I have seen some projects die because of this

    Best regards

Re: Are you coding websites for massive scaleability?
by kidd (Curate) on Oct 30, 2008 at 04:19 UTC
    I think that you should always try to plan your apps for scalability.

    Some time ago I found myself rushing to rewrite a badly designed application of mine because I never thought that it would need to scale, very bad experience, since then I always try to make my applications as scalable as possible.

    I like to use CGI::Application as my framework along with HTML::Template. I find that it makes it very easy to organize the code and adding things it's not problem.

Re: Are you coding websites for massive scaleability?
by leocharre (Priest) on Oct 30, 2008 at 15:13 UTC
    I've done a little bit of this.
    I have a backend application that's stand alone - that does ocr- for documents submitted. So- it figures out what machine to delegate to.
    The front end however is just a simple front end. It doesn't deal with any of the details. It just takes input- checks it for common sense-ness- and passes it along. CGI::Application.

    I used hacks to t/est. I regex into the return of CGI::Application- the returned html code of a 'pageview'.

    Ultimately I have to check it via hand to really know. It's not easy to test.t html output.

Re: Are you coding websites for massive scaleability?
by wol (Hermit) on Oct 30, 2008 at 16:57 UTC
    Are you coding your websites for massive scalability?
    No, because my websites don't need to be massively scalable, and I do need to do other things.

    The web tools I'm responsible for are internal to my company, so there's a maximum of about 1000 users. On a typical day, there might be a few hundred transactions. It might even get into the thousands.

    But the more significant thing is that I have to interface with a bunch of other systems (SCCM, Defect Tracking, Requirements Definition, ad hoc Wikis) which are themselves not massively scalable (or trying to be).

    If we needed to open up our operation to the world at large, all of those systems would need to be re-considered, at which point my stuff would need to be re-done from the ground up anyway.

    Of course, if that ever happens, I'll be rummaging around here to see what my options are - there's plenty of monkish expertise with this kind of operation.

    --
    .sig : File not found.

Re: Are you coding websites for massive scaleability?
by SilasTheMonk (Chaplain) on Nov 01, 2008 at 19:22 UTC
Re: Are you coding websites for massive scaleability?
by CountZero (Bishop) on Oct 29, 2008 at 22:14 UTC
    And what is the point you try to make?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      And what is the point you try to make?
      No point at all.

      No accusatory tone or anything. Just wondering which of the many ways to do it are people actually using, on larger projects that get some actual money thrown at them.

        Just to throw it out there: I'm getting a lot of money thrown at me to write terrible, un-scalable, insecure, accidentally functional, untested and largely untestable code and though it's the most money, it's not the first time. (Dudes, don't start. I tried and tried and got blocked and blocked and the contract is almost up so...)

        Businesses are only as "smart" as they need to be to bring in enough money to function and keep the honchos in villas. It's a C- world. Thank the Lords of Chaos -- or is it Law? -- for a few A+ niches like PM.

        As to the OP: one of the reasons Perl is, so I say, the best general programming language out there is the frickin' ridiculous speed you can prototype almost anything and redo it just as fast if needed. It can be a waste of time and effort to try to plan for a big app before the app is big. Fortunately there are sweet spots now like Catalyst that make it a moot point. But the idea is, don't put more value into a thing than it's worth. You wouldn't fix up a used VW with a jet engine when all you need is to drive to work. I still write the odd standalone CGI for things that will never see more than a few hundred loads a day. And if they suddenly do, it's a 1-3 hour project to rip the guts out and put them into Catalyst or CGI::Application or FCGI or whatever. I'd rather use those 3 hours up front to read or hike or ...

Re: Are you coding websites for massive scaleability?
by Anonymous Monk on Oct 31, 2008 at 21:01 UTC
    ... speaking of tests, don't forget the Test::WWW (or similar) modules... ( I would research more but I'm on a "high quality" internet connection... Though, for stress testing, consider more comercial options... Good luck && Happy Halloween!
Re: Are you coding websites for massive scaleability?
by Anonymous Monk on Oct 30, 2008 at 02:47 UTC
    massive scalability
    Is that like a massive black hole? :D

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://720306]
Approved by ikegami
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2024-04-23 20:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found