Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Make your 404 pages smarter with metaphone matching

by marto (Cardinal)
on Sep 03, 2007 at 14:11 UTC ( [id://636733]=perlnews: print w/replies, xml ) Need Help??

Another Perl article has been published at IBM developerWorks. Make your 404 pages smarter with metaphone matching:

"Create your own 404 error-message handler to provide useful links and redirects for the contents of your site. Use metaphone matching and a simple weighted score file to make typographical, spelling, and bad-link redirect suggestions. Customize the suggestions based solely on your Web site's content and preferred redirection locations. Catch multiple errors in incoming URL requests and process them for corrections in directory, script, and HTML page names."

As usual with the IBM developerWorks articles, there is feedback form at the bottom.

Martin
  • Comment on Make your 404 pages smarter with metaphone matching

Replies are listed 'Best First'.
Re: Make your 404 pages smarter with metaphone matching
by merlyn (Sage) on Sep 03, 2007 at 14:15 UTC
    I'll treat this like a slashdot pointer and make a comment before reading the article, but I hope the author was careful not to allow any "helper" techniques reveal hidden URLs or other files. mod_speling (yes, that's the way it's spelled) was notorious for that, happily handing out pointers to guessed-at URLs. Oops.
      As far as i can see, the script used filters all files by extension. Everything with .html gets indexed, everything else isn't.
        Everything with .html gets indexed, everything else isn't.
        And ... what?

        That doesn't address my concern at all. If I have a private URL that ends in ".html", it'll still likely get indexed. Then someone guesses a URL similar to that, and boom, they're in.

        A good solution would also have an additional regex or blacklist of things that should never be offered as a suggestion.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlnews [id://636733]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (2)
As of 2024-04-19 22:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found