Clear questions and runnable code get the best and fastest answer |
|
PerlMonks |
Speed searching HTML docsby the_slycer (Chaplain) |
on Aug 15, 2002 at 19:59 UTC ( [id://190510]=perlmeditation: print w/replies, xml ) | Need Help?? |
I've been pondering changing an application that I created a year or so ago to make it somewhat more robust. The application is a tool that allows searches through a whole bunch (say 700 or so) HTML files. It prints a listing (and a link to) of files, sorted on how many times the keyword is matched. So far so good, this sounds easy right? The problem lies in the fact that the documents are consistently updated, say 6 or 7 files get changed every day. They are updated by multiple users. The searches need to be as "real-time" as possible. The way that I've solved this in the past was by building 2 applications. One to check the files for updates (every 5 minutes), parse the files, and store a hash mapping keywords to filenames in a storable file. The second app is just a cgi interface that loads the stored file, blazingly fast finds the "answers" to the search. There are a couple of reasons that I don't like this approach, the main one being that every once in a while, the application checking for updates dies. Sometimes we don't notice and people are retriving out of date information. The second reason that I don't like this is because the tool that is monitoring the files is run from a command prompt (yes this is all on Windows), which requires the server to be logged in. The third reason that I want to re-write this is because I finished it when I was much more new to perl than I am now. There is some really ugly code in it, and I may be moving to a new job (actually just losing this one) and I want to leave my successor readable code. So, I'm polling for suggestions, given the above scenario, what would you suggest the best way to accomplish my goals would be? Those goals, to clarify: pseudo-real time search, very fast, and stable The ideas I've been kicking around:
Back to
Meditations
|
|