http://qs321.pair.com?node_id=729482


in reply to Regex To Remove File Extension

There are a lot of ways to skin this cat:

s/\..*+$//; s/\-[^\.]*$//;

Both of these use $ at the end to anchor the regex to the end of the string. The first uses .*+, the non-greedy version of .*, the second uses the character class of all chars except '.' to only get the last suffix

Another possibility is to use the perl module File::Basename and this is probably the best way, because you don't need to worry about getting it right, someone else did that already

UPDATE: kennethk is right, the first version doesn't work. Obviously the regex engine never matches from right to left even when anchored to the right

UPDATE2: Seems to be not my day. 3 errors in two lines is quite depressing

Replies are listed 'Best First'.
Re^2: Regex To Remove File Extension
by kennethk (Abbot) on Dec 10, 2008 at 19:17 UTC
    Both of these fail. .*? is the non-greedy version, not .*+, so s/\..*+$// fails on compile, and still doesn't work right if debugged because it's matching off of the first period. Your second expression has a typo (- in place of .) so it should read s/\.[^\.]*$//, as per dreadpiratepeter's post.
Re^2: Regex To Remove File Extension
by Narveson (Chaplain) on Dec 10, 2008 at 20:14 UTC

    File::Basename says

    $basename = basename($fullname,@suffixlist);

    If @suffixes are given each element is a pattern (either a string or a qr//) matched against the end of the $filename. The matching portion is removed and becomes the $suffix.

    So File::Basename doesn't solve the original problem, it requires the solution before it can be used.

      Of course if you have a manageable list of extensions, you could populate the suffix list and use File::Basename very easily. That is, the original post says "extension might not always be txt," but that doesn't indicate the scope of potential extensions. Maybe it's just txt, html, htm, pl and cgi (just a random group of extensions chosen). In which case I'd lean towards the File::Basename solution rather than creating a regex unique to this script.

      Or maybe the AnonyMonk means to be able to remove any extension, in which case File::Basename isn't the best solution. Obviously the AnonyMonk will need to choose the best approach, but I wouldn't discount File::Basename for a limited number of extensions.