Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Re: Total speculation?

by BrowserUk (Patriarch)
on Oct 02, 2003 at 10:54 UTC ( [id://295884]=note: print w/replies, xml ) Need Help??


in reply to Re: Total speculation?
in thread Total speculation?

Personally, I don't think that there is that much wrong with PM.

There are some things I would like to see fixed and there's no doubt that any codebase that has been subjected to 'live maintainance' over an extended period is surely due for some remedial refactoring, and there undoubtably some good ideas (tye has mentioned a few in the past) that could be implemented to improve the performance etc.

I don't think I would throw the baby out with the bathwater. Too many times, essential fixes to the original codebase get overlooked in "rewrites from scratch", and then have to be re-invented when the new system goes operational and you rapidly end up with a new, but heavily patched system that exhibites all the same flaws as the original.

It comes back to the thoughts I expressed a while ago of always starting from a working base. Essential to the philosophy is the ability to have ideas and try them out without compromising the working system. A test system is pretty much essential.

It breifly crossed my mind that maybe something like thttpd -- a lightweight, throttleable server might be used to provide this, but I seriously doubt that it has a mod_perl environment available:(


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.

Replies are listed 'Best First'.
Re: Re: Re: Total speculation?
by hardburn (Abbot) on Oct 02, 2003 at 18:47 UTC

    Perhaps an Apache virtual host would do it. I think a bare-bones throttling system could be set up with a filter. It could certainly be done with an Apache 2.0 PreConnectionFilter, though I doubt anybody is willing to upgrade PM to Apache 2 just yet.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated

      FWIW, I did discover that there is a version of THTTPD available with a built-in perl5 interpreter. Whether that would be enough to run mod_perl I have no idea, but if it is, it might be worth considering. At 200k, it's a single-threaded, single-process, select model server with every conceivable throttling option built-in, and more than up to the task of acting as a test server.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      If I understand your problem, I can solve it! Of course, the same can be said for you.

        mod_perl is geared heavily toward Apache, exposing its API to Perl. A server without that API cannot run mod_perl, by definition.

        Makeshifts last the longest.

        I start googling, but perhaps you could post the link? This is interesting in it's own.
Re: Re: Re: Total speculation?
by perrin (Chancellor) on Oct 03, 2003 at 21:49 UTC
    Personally, I don't think that there is that much wrong with PM.

    You're talking about features. Feature-wise, there is not much wrong except the performance. However, contributing code to PM is much harder than for most open source projects, and the site suffers because of it.

    If PM had all of its code in CVS, and a clear installation procedure, and a test suite (with fake data) to tell you if you broke something, there would be a lot more people helping out with the code. I'm not sure there's any way to get there without a complete rewrite, as Liz suggested, and I share her doubts that anyone will manage to find time to do that. A migration to a later version of Everything might be easier, but I'm not certain of that.

      I think that we are mostly in agreement.

      I agree that a complete re-write is not on the cards, for the reason you gave of time, but also because I think that starting from scratch would be throwing away a lot of good code which is just wasteful. Given this isn't going to happen, what are the alternatives?

      I'm not sure that migration to Everything 2 would be helpful either. Whilst it would make it easier for individuals or groups to set up replications of PM, there would still be a whole lot of customisation in the core, and sensitive data that would be impossible to share openly and very difficult to mock up.

      Even if this were done, it would still mean that contributions from outside would need to be submitted to PM with a promise of "I've tested it thoroughly and it's fine!", which isn't going to work. The gods would still need to inspect for backdoors and malicious failures and test to their satisfaction. The same bottleneck would exist.

      The only way I could see of alleviating the bottlenecks in the testing and approval mechanism -- beyond recruiting 100 new gods (Maybe from the Indian subcontinent, they seem to have more than their fair share:) -- is to make it possible for PMdevers to test their own code in a realistic, but non-critical environment, and in a way that allows the gods to verify the varacity of their testing (by inspecting the logs of the test system to check for errors, the number of times the modification has been exercised etc.). The only way I can see of doing that is for the test environment to be in the same box and sharing the same (live) data.

      Obviously, ad-hoc changes to the live system aren't desirable, so logic led me to suggest a test server with limited bandwidth/cpu accessing the same data except for updates.

      From my, very limited, external viewpoint, this is the only possibility that addresses the problems. The alternative is to stick with the status quo, which while an option, and currently the only game in town, is the reason for the disquite in the first place.

      The idea is far from unique. Having test systems that referenced live databases read-only and wrote updates to a different database or a seperate table within the database was once common practice when disc storage was too expensive to replicate whole databases willy-nilly. This is just an extension of that idea attempting to work around the specific PM peculiarities. It cost nothing, except a little of my time to write it up, and a little of your time to read.

      I don't have a good enough view to know if it is feasible or practical, but I thought it worth mentioning anyway.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      If I understand your problem, I can solve it! Of course, the same can be said for you.

        starting from scratch would be throwing away a lot of good code which is just wasteful

        I don't buy the whole Joel Spolsky theory that you should never throw away working code. Sometimes the existing code is written with assumptions that no longer apply, and removing those assumptions piece-by-piece -- or even discovering what those assumptions were! -- is just too difficult. It's a moot point though, given the amount of work it would likely require. I'd be happy to see PerlMonks drop lots of features that I consider bloat, but every feature seems to have some individual who swears they can't live without it.

        I'm not sure that migration to Everything 2 would be helpful either. Whilst it would make it easier for individuals or groups to set up replications of PM, there would still be a whole lot of customisation in the core, and sensitive data that would be impossible to share openly and very difficult to mock up.

        Everything2 has moved the code out of the database and into CVS (at least that's what chromatic told me in another thread), and that's a major structural improvement. Hopefully it provides a better approach to customization as well. As for the difficulty of creating test data, everyone always has an excuse for not doing this, but it's important. Creating test data would mean that developers who don't have access to the live PM database could actually test their code!

        The gods would still need to inspect for backdoors and malicious failures and test to their satisfaction

        If everyone could download and run the code easilly, then everyone could help with this too.

        The only way I could see of alleviating the bottlenecks in the testing and approval mechanism (...) is to make it possible for PMdevers to test their own code in a realistic, but non-critical environment

        With access to the code, an easy install, and a test suite, anyone's laptop could be a realistic but non-critical environment.

        Your idea might work as a stopgap measure, but I don't think it addresses the real problem, which is the difficulty of contributing substantial well-tested code.

      One thing I find interesting is the idea that moving code out of the database would improve things. I dont think it would. In fact my tendency is to go totally the other way. Consider that in order to deploy changes to PM that arent in the DB we need to ssh into at least two boxes, upload the perl modules and then force a server restart. Wheras we can make on the fly changes to in-db code and have it automatically deployed seamlessly.

      I really dont think having the code in CVS or equivelent would particularly helpful nor do i think it would increase the number of contributors. On the contrary in fact. One aspect of the design of PM (and Everything) is that nodes are both suboutines and objects. CVS'ing the code would severly impact on the objectness of the code. Consider something like patch display page. Out of context from the monastery that code means pretty much nothing. In context its a window into the soul of the system. Putting it in CVS would lose the important part. It would be kinda like taking a window with a fine view and sticking in a warehouse and then wondering why it didnt look as good.

      A bunch of us in pmdev have even discussed moving the entire Everything code base into the DB and the boot strapping from that. We wont ever do it of course but the fact that we even think its a good idea suggest that there is something to this point.

      As a last aspect, PM itself is a stones throw away from CVS anyway. We can currently cross diff patches both on a single site or between the two. We can view a nodes patch history, and etc. We can and will expand these features as well. pmdev is very much alive and functional these days, and i dont think it would be if it was that horrible to work with.


      ---
      demerphq

        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi


        First, I want to say thanks for your contributions to the site. Your patches are much appreciated.

        There are a few reasons I think that puting the code in the database is a bad idea, and I speak from experience on this, having started my serious programmimg career with a system that stored all code in a database. Principally, there are lots of great tools for working with files (CVS, grep, diff, emacs, rsync, etc.) and all of them have to be reinvented when the code is in the database instead. At the very least, you need something to import/export the code from the database. Writing test scripts becomes much more difficult. Getting a working copy of the current code becomes a chore.

        You mention the difficulty of updating. This is not a hard problem to solve. Simply using Apache::Reload will avoid having to restart the server. However, you'll shred your copy-on-write shared memory this way, which must already be happening with the current system. That hurts the scalability of the site, since it lowers the number of processes you can run without going into swap.

        Keeping code in CVS (or Subversion or whatever) is the expected standard, and there's a good reason for it. It allows you to do things like branches, which are not possible with a simple revision system. It also allows people who are familiar with other open source projects to get started quickly. Creating RCS-like functionality in PM itself is not a substitute for the full power of source control. I also don't really buy into the idea of this being more object-oriented. Looking at the node you linked to, I see no POD, no easy way to write a test script, no easy way to run perltidy on it, no way to perldoc it if there was POD, etc.

        The bottom line is that putting all the code in the database is non-standard and unnecessary. It makes things much harder for new people who are familiar with other open source projects and are interested in getting involved with PM. It obviously has not prevented work from going on, but I consider it a clear negative, and I suspect many others would agree with me.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://295884]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-25 22:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found