http://qs321.pair.com?node_id=138553

Gerard has asked for the wisdom of the Perl Monks concerning the following question:

There must be a better way, but I have always been a clutz when it comes to reg exp and the like so I decided to do this using a split. I get the right results, but it just doesn't seem right...
I have a url with I want to extract the filename off the end. It is easy right? Thought so, I am just having another one of those days when my caffeine levels have snuck dangerously low and my sleep deprevation is finally catching me up, and screaming at me to go to bed... Anyway
$subject="http://www.spied.co.nz/thinmailer.cgi"; @ar = split (/\//, $subject); $filename = @ar[$#ar]; print $filename;
Any suggestions?
Much appreciated, Gerard (the caffeine addict).

Replies are listed 'Best First'.
Re: A better way? Extracting filename from url
by gav^ (Curate) on Jan 14, 2002 at 19:38 UTC
    What's wrong with using the nice URI module?
    use URI; while (<DATA>) { my $file = (URI->new($_)->path_segments)[-1]; print $file, "\n"; } __DATA__ http://www.spied.co.nz/thinmailer.cgi http://www.site.com/dir/file.html?param1=val1&param2=val2

    gav^

      This sounds like the best way to me. Don't forget to uri_unescape() (or whatever it's called; I don't quite recall) the filename part to deal with stuff like "file%20name.htm".

      Thanks,
      James Mastros,
      Just Another Perl Scribe

Re: A better way? Extracting filename from url
by Parham (Friar) on Jan 14, 2002 at 16:28 UTC
    i use these with local harddrive filenames, but they should work pretty much the same for url's

    method 1:
    $file = substr $url, rindex($url, '/') + 1; #path contains full url

    method 2:
    $url =~ s/^.*\///; #just use the url to get the filename with a regex

    method 3:
    ($file) = $url =~ m!([^/]+)$!; #another regex way
      It works, but I guess you've also got to consider the possibility that the URL will contain query parameters, like this: http://www.site.com/dir/file.html?param1=val1&param2=val2

      So a second substitution (I'm sure you can do it with a lookahead assertion or similar, but that seems a bit overkill) is probably in order:

      $file =~ s/\?.*//;

      Cheers,
      -- moodster

        Moodster has a good point. Thanks all for your comments. To be honest I had not even considered the possiblity of query paramters, which is very silly of me. But hey, it was late. Anway, I can now look at this again a bit later on.
        I am constantly amazed and impressed with the good nature and high value of comments that come out of this site.

        Regards,
        Gerard
        The caffeine addict (now sufficiently supplied).
Re: A better way? Extracting filename from url
by arhuman (Vicar) on Jan 14, 2002 at 16:25 UTC
    Did you try basename ?

    It's not the usual way (regexes or URI::... modules) but it seems to work.
    (Need some testing although...)


    "Only Bad Coders Code Badly In Perl" (OBC2BIP)
Re: A better way? Extracting filename from url
by flocto (Pilgrim) on Jan 14, 2002 at 18:28 UTC
    An easy way to do this is this simple regexp:
    my ($file) = $url =~ /\/([^\/]*?)(?:\?|$)/
    This will work with host:port, directories and parameters given in the url. Hope this works for you..

    -octo-
    --
    GED/CC d-- s:- a--- C++(+++) UL+++ P++++$ L++>++++ E--- W+++@ N o? K? w-- O- M-(+) V? !PS !PE !Y PGP+(++) t-- 5 X+ R+(+++) tv+(++) b++@ DI+() D+ G++ e->+++ h!++ r+(++) y+
Re: A better way? Extracting filename from url
by Caillte (Friar) on Jan 14, 2002 at 18:03 UTC

    One way would be :

    $subject="http://www.spied.co.nz/thinmailer.cgi"; $subject=~ s/.*\/([^\/]*)$/$1/; print $subject

    What this does is return the last block of code that does not contain a backslash, from the last backslash to to the end of line.

    This page is intentionally left justified.

Re: A better way? Extracting filename from url
by ropey (Hermit) on Jan 14, 2002 at 19:10 UTC
    my $url = 'http://www.test.com/test/fragr/asas.htm'; $url =~ m/.*\/(\D*?\.\D*?$)/; should do it
Re: A better way? Extracting filename from url
by archen (Pilgrim) on Jan 14, 2002 at 19:38 UTC
    I've made a few programs which deal with filenames of URL's and I take the same approach as you, although I use a slightly different method. To me it's more important that you use what you're comfortable with since you're the one that will probably be reading the code later.
    $subject="http://www.spied.co.nz/thinmailer.cgi"; $filename = (split(/\//, $subject))[-1]; print $filename;