Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Longest Matching URL

by Corion (Patriarch)
on Mar 20, 2002 at 17:34 UTC ( [id://153041]=note: print w/replies, xml ) Need Help??


in reply to Longest Matching URL

If you have control over the database, I'd first of all restructure the database to get rid of the REGEXP $uri part in your SQL statement - simply introduce either a new column called host, or, to completely stay within the relational mindset, introduce a new table in which you store all your hosts, and introduce a new column in tURIMapping, in which you store references to all hosts. The second idea is cleaner in the sense of pure relational databases and normalisation, but the first alternative is much easier to implement.

Searching for the "best" (==longest) match is done easily if you sort the list by the length of the entries in descending order. I'm not sure if you can convince SQL to do this with a clever ORDER BY len() clause, but you could introduce a second column, which stores the length of each URL, and modify your select statement to SELECT ftpID,uri,path FROM tURIMapping WHERE host=? ORDER BY length DESC.

That way, the database will do most of the work for you, and you now only need to walk the results until you find the first string that (partly) matches your searchstring - as the database has ordered your results, you can guarantee that this will be the longest possible match.

Note that, if you decide to split off the hostname from the rest of the uri, you will have to slightly modify the way you construct your return values and the values you put into the query.

For the uri parsing part, I recommend taking a look at URI::URL module, which nicely splits up a lot of obscure uris.

perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://153041]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-04-25 20:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found