![]() |
|
go ahead... be a heretic | |
PerlMonks |
Offline wikipedia using Perlby grondilu (Friar) |
on Mar 08, 2012 at 13:49 UTC ( #958466=CUFP: print w/replies, xml ) | Need Help?? |
I'm finally happy with the code I wrote to browse wikipedia offline. The most tricky part was to keep the database small. So I made a database with blocks of 256 articles. Each block is frozen using Storable and then compressed with Bzip2. Doing so, the created database is only about 15% larger than the original xml.bz2 I also use XML::Parser to parse wikipedia's database dump. Here is the most difficult part: converting the XML database (see http://download.wikimedia.org) into a usable one:
I think it works pretty well, even if the rendering of the Text::Mediawiki module is a bit ugly for some pages. I need to take care of the references for instance. Still, it does the job, and it's much faster than on-line browsing. I posted everything (including the CGI script) on my wikipedia userpage, as it also concerns wikipedia users: http://fr.wikipedia.org/wiki/Utilisateur:Grondilu/Offline_Wikipedia_PerlEDIT. I also set up a github repo: https://github.com/grondilu/offline-wikipedia-perl
Back to
Cool Uses for Perl
|
|