Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Re: Re: redesign everything engine?

by chromatic (Archbishop)
on Jan 28, 2003 at 21:09 UTC ( [id://230749]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: redesign everything engine?
in thread redesign everything engine?

I profiled the current CVS in single-user mode on my laptop this weekend. I'm not really concerned about any one specific site, just the framework and general behavior. If I can speed that up, I'll have met my goal.

I'm not terribly concerned about the XML, though using XML::DOM is a performance killer. That's mostly during the installation, though, so it's a low priority along the performance axis. The only place it's really used internally in the live system is in the workspacing code, and I don't think there's any of that on Perl Monks at the moment.

Caching is complicated by the fact that the current CVS has subrefs in it. That's why I'm betting my managed-forking approach will have better performance in certain circumstances.

The performance killers, as I see them:

  • pages are optimized for writing -- parsing links every time, processing page templates on every hit. This is ameliorated somewhat by code caching in the 1.0 series
  • nodelets are cached for the whole site or not at all -- they could be cached per user for a speed improvement
  • nodes have a custom inheritance scheme to deal with nodemethods, which was between 10 and 20% of the profiled time in my tests -- this could be reduced further
  • inefficient database queries, fetching hashrefs when bound scalars and explicit column names are 20-50% faster -- I'm working on this
  • inefficient code -- we're better programmers now

I'm working on all of these, but it's at the tip of CVS. My goal is to make migrating to Everything 1.2 an attractive option for Perl Monks.

  • Comment on Re: Re: Re: redesign everything engine?

Replies are listed 'Best First'.
Re: Re: Re: Re: redesign everything engine?
by perrin (Chancellor) on Jan 29, 2003 at 03:58 UTC
    By grabbing the latest Everything from CVS, you're kind of high-lighting the problem here: there is no current CVS of PerlMonks, because the code is kept in the database. There is no convenient way to get the latest code, let alone branch it for a major revision. It also makes the task of incorporating updates from Everything that much harder. This is why I think storing code in the database is not a good idea at this point. I'm sure there were reasons for it at the time, but it is counter-productive now.

    When I referred to XML, what I was really thinking of was the way nodes are stored and the resulting update problems (some of them are described here). I don't think this would be such a problem with a more normalized database schema and a codebase that allowed for finer-grained locking during updates.

    About the cache: subrefs are okay as long as Storable can handle them. Objects that can't be serialized can't be cached between processes at this point. At the moment, Perl threads are not very good at sharing objects so mod_perl 2 may not solve this issue any time soon. I'm not sure what your managed-forking idea is, but I don't see why it wouldn't have to deal with exactly the same issues mod_perl does.

    I don't want to sound like I'm just whining about the code. I am grateful for the existence of this site and your part in creating the code that made it happen. I do think that some of the design ideas have not scaled well though, and that it will be hard to fix it completely without fundamental changes.

      I do appreciate your comments, perrin, and you're the first person I'll ask about an inter-process cache.

      All of the code for the base install of the system is stored in CVS, though -- including the core nodes. It would be nice to do this with Perl Monks as well. (There'd probably be three or four specific nodeballs.) I'm planning to revise the XML format slightly so it's even easier to see changes between node revisions.

      An inter-process cache with its own locking mechanism could help, but there are other ways to avoid it. I'm inclined to propose a rule that all updates are commited to the database at the end of a request.

      Any suggestions to improve the normalization of the database are welcome. For speed reasons, I'm tempted to move the doctype doctext field to a separate table. I'm definitely going to fix the hacky settings by making a one to many table for individual settings. That's another post 1.0 change.

      The problem with caching subrefs is that you'll still pay the eval() penalty. I'd prefer to cache any calculated field, though, as we do many times more reads than writes. That seems like a web-side enhancement, but if we have an interprocess cache, we can avoid many database hits, which will help.

      My managed-forking updates the parent process whenever the cache changes, so the cache is always in the parent. This includes code. All forked children share that memory. I've not found a way to do that with Apache.

      Finally, I agree about fundamental changes. That's my plan. I'm just changing the existing code, not starting over.

Re: Re: Re: Re: redesign everything engine?
by Jaap (Curate) on Jan 28, 2003 at 22:53 UTC
    I would love to help out on the code (as might more people on perlmonks) but i don't want to download and install apache, mod_perl, MySQL and everything.

    Could it be a good idea if you, chromatic posted pieces of everything code for us monks to review? I am sure we can come up with some improvements which you could decide (not to) implement on that piece of code.

    carefull readers might understand by now that i would do anything to make this site faster, except really delve into the everything engine ;-)

      Most of the performance problems are architectural. The act of finding appropriate snippets to post means finding bottlenecks. I'm not sure that'll help, because once I find a bottleneck, I can usually fix it. The code's reasonably well-factored now.

      Besides that, we're working from a substantially newer version of the code than runs Perl Monks. nate added a stricter nodeversion caching system, I added code caching, and so forth that Perl Monks doesn't have.

      On the other hand, I almost have a DBD::SQLite backend ready to go, so, if you munge the install process just a bit, you can install the core system without Apache, mod_perl, or MySQL.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://230749]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-04-25 23:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found