Clear questions and runnable code get the best and fastest answer |
|
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
You also might want to consider a different approach. It's really hard to define what a "valid" URL is. Maybe you only need http://, or maybe http:// and ftp:// or etc.. etc.. Then there's the problem of non standard URLS that I'm sure someone is using or will start to use. For instance if Microsoft released a product that used a URL like bill://. You might have to support it, even if it's not in a standard. Rather than trying to validate the entire url as a regex, break it into parts, then test them. For instance, test the bit you think is a host name by running gethostbyname() and test the part that names the protocol by running getservbynam(). This takes some of the strain off your regex. The best part is, you don't have to update your script to keep up with changes in the world. If a new bill:// protocol comes out (and you keep your /etc/services file up to date), your script won't miss a beat. Even more likely is a new top-level domain. Of course, this will impact performance, so you need to ask yourself how fast you need this to be and how well you need it to check the URL. If letting a bad URL through is just a little annoying, it might be easiest to cull out the really egregious offenders and let the slippery ones pass. If on the other hand, you really suffer if a bad URL makes it past this test, it might be worth the clock cycles. In reply to Re: regex to match URLs
by pileofrogs
|
|