Google like scalability using perl?

I have had enough with CouchDB. Really. I just couldn't wait for queries to run for the first time any longer. And I really wanted to write my views in perl. Because I was mostly manipulating hashes which came from perl anyway.

Then I started dreaming. Disks are slow. Only right way to read them is sequentially, so it rules out disk as storage if you want to have fast access. So, what is fastest way to run perl snippets? In memory.

But, I would run out of memory eventually. So I would have to support sharding across bunch of machines... That would also enable me to scale linearly just by adding a new machine or two...

I just had a bunch of web kiosks running Debian live around, so I decided to try does this idea sound sane in a week. And sure enough, I had working version after a week.

Then I decided to rip it apart and rewrite into simpler one which is modeled around management of individual nodes and much simpler network protocol. And I discovered that parsing file on master node isn't bad because it cuts down on required dependencies on deployment. And with a little bit of caching it even provides speedup on startup.

In this stage, it's quite usable toy. I would really appreciate feedback about it. Is it interesting for other people to try it? Or did I just learned a few lessons about scalability that everybody else already knows?

And lastly how would you release such a toy? CPAN doesn't really seems like a right place, so is Ohloh enough? It's not in git, so github isn't really an option.

Update: (much) more details about Sack implementation and reasons behind them in comments thanks to everybody who asked and forced me to clarify :-)

2share!2flame...

Comment on Google like scalability using perl?

Replies are listed 'Best First'.
Re: Google like scalability using perl? by merlyn (Sage) on Oct 09, 2009 at 00:12 UTC
Are you reinventing MapReduce? If so, how is yours better, other than being yours? :) -- Randal L. Schwartz, Perl hacker The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.	[reply]
Re^2: Google like scalability using perl? by dpavlin (Friar) on Oct 09, 2009 at 00:57 UTC
Of course I am! That's Google reference in title. But with perl and in-memory :-) TIMTOWTDI I guess... 2share!2flame...	[reply]
Re: Google like scalability using perl? by BrowserUk (Patriarch) on Oct 09, 2009 at 00:47 UTC
Three questions: Could you outline an (the) example of the data and processing that caused you to seek out your solution? Do you have any feel for how your solution scalas? Do you have any feel for how your solution might perform? For example, what kind of SPC-1 IOPs rating do you think it might achieve? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP PCW It is as I've been saying!(Audio until 20090817)	[reply]
Re^2: Google like scalability using perl? by dpavlin (Friar) on Oct 09, 2009 at 01:12 UTC
My goal is to process ~130000 bibliographic records (with tree-like structure) with interactive editing of perl code which is applied to each record with interactive response (under three seconds for edit/run/see result cycle). Right now, execution of code on each database record isn't a bottleneck (code is simple, and even if it isn't I would just boot a few more machines and distribute smaller shards of data to each of them). However, when nodes start to produce more than ~20000 results merge on master becomes a problem. This is where merge to central Redis from nodes comes into play, removing need for merge step all-together at expense of network traffic. My fingers are itching to try that out, but this will have to wait at least tomorrow morning... 2share!2flame...	[reply]
Re^3: Google like scalability using perl? by BrowserUk (Patriarch) on Oct 11, 2009 at 07:51 UTC
You say that Disks are slow. But network IO is slower! Your solution is to keep all the data in memory by distributing it across multiple machines, but for 130,000 bibliographies, a (these days quite modest) 2GB machine could handle 16KB per record. Even with structural overheads that is a fairly expansive bibliography. By moving to even the most basic 64-bit OS on commodity hardware, that can limit can be at least quadrupled. Whilst you gain through paralellism of the multiple boxes, you loose through networking (IO and coordination). You say "This is where merge to central Redis from nodes comes into play, removing need for merge step all-together at expense of network traffic.", but one way or another, the results from multiple machines has to end up being gathered back together. And that means it has to transition the network and be written to disk. These days, even commodity boxes come with 2 or 4 cores, with 6 or 8 becoming the norm at the next inventory upgrade cycle. If you applied the same 'remote processes' solution you are using with networked boxes within a fully-RAMed, multi-core box, the 'networking' overhead becomes little more than serialisation through shared memory buffers (plus some protocol overhead that you'd have anyway). If you use threads instead of processes, each core can process over a segment of shared memory, avoiding all the networking overhead entirely. And by returning references to the shared memory rather than data, to the merging thread, all network IO is avoided completely. And for scalability; with 64-bit OSs capable of handling at least 128GB of virtual memory, you'd have plenty of room for growth. Whilst random access to virtual memory greater than physical memory will 'thrash the cache', there are still huge gains to be had by avoiding the networking protocal overhead. Please don't take this as a critisism of your sharing your current solution. That's all good. I just can't help think ing that for problems of this scale, and those an order of magnitude or two larger, it is a solution with a quite limited lifespan. For a while now I've been looking at a solution of using virtualised memory (memory mapped files) to provide generalised, thread-shared, random read-write access to large datasets (up to ~128GB), and your problem is the first real-world task that has come up here. If you could share your dataset, or at least a reasonably detailed schema sufficient to allow the creation of a realistic testset to be generated, it would be extremely interesting (to me) to see how the two approaches to the problem compare. In terms of both complexity and performance. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP PCW It is as I've been saying!(Audio until 20090817)	[reply]
Re^4: Google like scalability using perl? by dpavlin (Friar) on Oct 14, 2009 at 15:29 UTC
Re: Google like scalability using perl? by ctilmes (Vicar) on Oct 09, 2009 at 00:16 UTC
Sounds like memcached. There are a variety of CPAN modules for interfacing with it.	[reply]
Re^2: Google like scalability using perl? by dpavlin (Friar) on Oct 09, 2009 at 01:03 UTC
Redis is another in-memory store. However i really wanted to keep data in perl memory for fastest possible execution of view code across data. I'm thinking about making merge to Redis directly from nodes to skip heavy merge on master if I have a lot of fields generated on nodes. This is the case with biggest query I currently have, but I still don't have code for that (Redis code currently in is left-over from first iteration which did merge to Redis on master). 2share!2flame...	[reply]
Re: Google like scalability using perl? by ctilmes (Vicar) on Oct 09, 2009 at 00:23 UTC
And lastly how would you release such a toy? CPAN doesn't really seems like a right place Why doesn't CPAN seem like the right place? Right or wrong, plenty of 'toys' get released there...	[reply]
Re: Google like scalability using perl? by toma (Vicar) on Oct 11, 2009 at 18:21 UTC
Another way to do this is to run Apache and mod_perl on different machines, and split the large hash between them. The hash and code stays in memory with mod_perl. You might also try memcached, as suggested above. The combination of memcached and mod_perl performs better than I expected. For doing a large batch job by farming work out to lots of nodes, the trick is to use something like Amazon's Simple Queue Service, which keeps the work from overlapping. I don't know of a perl module that implements this, so if your code has this feature it would make a great CPAN module by itself. It should work perfectly the first time! - toma	[reply]
Re^2: Google like scalability using perl? by dpavlin (Friar) on Oct 14, 2009 at 22:26 UTC
This is great solution if you can afford this kind of deployment. I'm really happy user of mod_perl, but this time I didn't really had machines for deployment at all :-) All I had where 12 machines dedicated to be web kiosks. This is why I'm trying to avoid disk activity. Besides short bursts of CPU activity, which can be controlled by shard size, I won't affect normal usage of machine which are dual core anyway, so I can use only one core if that becomes problem. In fact, i went so far to require only core perl modules so I can depend only on perl which is standard on Debian installs anyway since packet manager uses it) and ssh (for which I use dropbear). I also noticed that automatic deployment of new version and restart is somewhat of challenge if you do it by hand, so Sack can push code update to nodes (using cpio over ssh since I don't really have scp or rsync) and re-exec itself. Which was nice, but re-exec required me to re-feed data onto each node on restart. This is also nice way to get recovery for nodes or some kind of load migration (if one set of machines becomes busy I should be able to move shards to other machines or increase shard size of idle machines). None of that exists yet, unfortunately. I have looked into messaging solutions, but my preference is to have queue locally (nodes are part of intranet network) and although they do have Internet connectivity, I would love them not to leave intranet. I would generally prefere something like RestMS (so, blame CouchDB REST influence on me) or even CouchDB with RabbitMQ before trying to develop for some external service. 2share!2flame...	[reply]


P is for Practical
	PerlMonks