Some light PerlMonks reading by the campfire

Ok, now that I've got your attention... what I'm wondering, is the best way to capture a bulk of PM for reading "offline", where there is no net connectivity (think WAY offline, as in.. no power for hundreds of miles), for potentially months at a time.

I'm going to start small, just try to see if its even feasible, and then expand it. I've done some similar projects in this area over the last few years, which have been quite successful.

I also looked around the monastery here, and found these somewhat-relevant nodes:

Public export of Perl Monks database from zby
Node XML to HTML by jeffa
Best way to traverse all Perlmonks nodes? by kvale
and Reading PerlMonks offline by zjunior

There are quite a few useful replies in there, and some referencing ThePen (which is down as I type this). Some talk about spidering the site, others about converting from XML to html, others to just pulling a database dump and reusing that.

Ideally, the best approach would be to dump the node tables and replies to some form of XML, like Wikimedia projects do. They have a tool called mwdumper (written in Java) that will take the XML export and pump it back into MySQL (I just did this for the latest Wikipedia database this weekend, it was over 4.5 million separate rows and took 20 hours to import, whew!).

But it doesn't have to be that complex... even just the XML dumps with some sort of linking to each of the replies, would be perfect.

Now I can also spider ThePen during off-hours (when it comes back online) and store the plain HTML that way, but that introduces load, latency, bandwidth issues and so on. I'd rather avoid that strain on someone else's server, because I know what its like when someone does it to my public servers.

Has there been any movement on the implementation of "nodeballs" yet in PM? The Everything Engine powering PerlMonks supports it, so I guess its just a matter of a concensus, and a vote, and enabling it?

What say ye?

Comment on Some light PerlMonks reading by the campfire

Replies are listed 'Best First'.
Re: Some light PerlMonks reading by the campfire by Corion (Patriarch) on Feb 18, 2007 at 17:53 UTC
I think the best/most usable solution for you would be to get a MySQL dump of the node table. This circumvents all the XML trickery and other problems that arise from the transfer to and from XML. If you are really keen on getting XML, g0n maintains an XML mirror of most newer nodes. Perlmonks is "based" on the Everything Engine but the developments have diverged into different directions and the two engines are basically incompatible, so I'm not sure that nodeballs could be enabled here easily.	[reply]
Re: Some light PerlMonks reading by the campfire by bart (Canon) on Feb 18, 2007 at 18:14 UTC
Just two references you seem to have overlooked: prlmnks.org, "Perlmonks with some bits missing" katterbox, a Java client that, last I looked, which isn't too recently, contained an offline browser. This project has been discontinued, but might still work.	[reply]
Re: Some light PerlMonks reading by the campfire by planetscape (Chancellor) on Feb 19, 2007 at 08:26 UTC
Don't forget: What XML generators are currently available on PerlMonks? PerlMonks::Mechanized (beta) ... especially for smaller scale experiments. HTH, planetscape	[reply]
Re: Some light PerlMonks reading by the campfire by wolfger (Deacon) on Feb 22, 2007 at 13:36 UTC
no power for hundreds of miles, for potentially months at a time What the heck is a Perl monger doing in a condition like that??!? -- Random Synapse Firing	[reply] [d/l]
Re^2: Some light PerlMonks reading by the campfire by hacker (Priest) on Feb 25, 2007 at 18:28 UTC
What the heck is a Perl monger doing in a condition like that??!? Teaching other neophytes the benefits of Perl and Linux, of course.. can't become a Master Jedi without becoming a young Padawan first, now can we?	[reply]