good chemistry is complicated, and a little bit messy -LW |
|
PerlMonks |
Re: Running SuperSearch off a fast full-text index.by creamygoodness (Curate) |
on Jun 10, 2007 at 16:33 UTC ( [id://620322]=note: print w/replies, xml ) | Need Help?? |
I've long wanted to do exactly what you've proposed, but just haven't found the cycles before now. I would be excited to collaborate with you on it. As for hosting, for the time being I can run the app at rectangular.com... and maybe we could set up a repository at code.google.com? ;) In addition to the indexer and search applications, we'll need a spidering app that pulls down a local copy of each PerlMonks node. tye has granted permission to spider the site, and suggested the PerlMonks XML node view for getting at the content (see What XML generators are currently available on PerlMonks? for info). Here's an XML rendering of your original post as an example. In the initial pull, we'd iterate over each node numerically, probably saving individual XML files to the file system, 1000 nodes per directory. Some nodes will present problems — reaped nodes, for instance — but the responses will always contain sufficient information to dispatch sensibly. Keeping the locally mirrored data up-to-date presents some problems, especially with regards to updated text and node rep fluctuations. These problems will be trivial to solve should the service move onto perlmonks.org directly; some of them are solveable even when running remotely, as the total volume of data is not very large. In any case, freshness issues will not have a major impact on the user experience and people will have no trouble making sensible comparisons between the old and the new. Once we have a corpus, the indexing and search apps will present familiar challenges for us both. It will be fun to tinker with the ranking algorithms, and I expect that the extremely demanding user base will provide us with lots of high-quality feedback. :) What say? Sound like a plan? Cheers,
In Section
Perl Monks Discussion
|
|