Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

regular expression on filenames with absolute path

by ramthen (Sexton)
on Nov 05, 2005 at 09:43 UTC ( [id://505958]=perlquestion: print w/replies, xml ) Need Help??

ramthen has asked for the wisdom of the Perl Monks concerning the following question:

hello Monks,

I seek your advise on the following situtation.

I need to extract filename and the dir path in which this filename is located. This has to be done from the root-level of a particular dir.

In the script that am writing, I followed this regex to extract the dirname and filename.

$abspath =~ /\/\w+(.txt|.doc|.pdf|.vsd|.xls|.xlt|.dot|.pot|.ppt|.mpp)$/;

with thie regex, the following are matched :

Framework/Templates_Doc.Conventions/CSSP_TPL_Process_Procedure_Description.xls

Framework/Templates_Doc.Conventions, , /CSSP_TPL_Training_Material.pot

but the following are not matched.

Framework/Templates_Doc.Conventions/CSSP_TPL_Excel_Meta-Template.xlt

Engineering/SBI/PD_PRC/Overall_SWReleaseFlow_v0.1.vsd

Could you pls. share your wisdom on how to solve this regex ?

thanks in advance,

Mars

2005-11-10 Retitled by planetscape, as per Monastery guidelines
Original title: 'regular expression on filenames with aboslute path'

  • Comment on regular expression on filenames with absolute path

Replies are listed 'Best First'.
Re: regular expression on filenames with absolute path
by Aristotle (Chancellor) on Nov 05, 2005 at 11:54 UTC

    Your current issue is that \w only matches letters, numbers and the underscore, but not dashes. You can say “match everything \w matches as well as these other characters” by putting it inside a character class: [\w-].

    Your next issue will be that you forgot to escape your periods, so your regex as it stands will match be_a_hippy/smoke_pot. You want to write (\.txt|\.doc|\.pdf|\.vsd|\.xls|\.xlt|\.dot|\.pot|\.ppt|\.mpp).

    Well, actually, you don’t want to write that, you want to write \.(txt|doc|pdf|vsd|xls|xlt|dot|pot|ppt|mpp).

    And (outside the fact that this also constitutes a microoptimisation) I’d want to group someone of those, using character classes, because they belong to the same application: \.(txt|do[ct]|pdf|vsd|xl[st]|p[op]t|mpp).

    Which brings us to

    $abspath =~ m{ / [\w-]+ \.(txt|do[ct]|pdf|vsd|xl[st]|.p[op]t|mpp) \z } +msx;

    I used \z, because that means end-of-string, whereas $ means end-of-string plus a variety of subtly different things. I also took the liberty of using the m// syntax instead of an unadorned // so that I could use delimiters other than the / – that way the / inside the regex needn’t be escaped. I took the additional liberty of adding /x so that whitespace inside the regex becomes insignificant – then you can use spacing, linebreaks and the like to structure the regex. Lastly, on TheDamnian’s suggestion from Perl Best Practices, I always also add /m and /s to my patterns out of habit.

    Makeshifts last the longest.

Re: regular expression on filenames with absolute path
by Samy_rio (Vicar) on Nov 05, 2005 at 10:17 UTC

    Hi ramthen, Try this,

    use strict; use File::Basename; my @f_path=qw(Framework/Templates_Doc.Conventions/CSSP_TPL_Excel_Meta- +Template.xlt Engineering/SBI/PD_PRC/Overall_SWReleaseFlow_v0.1.vsd); for (@f_path){ my ($dir) =dirname($_); my ($fname) =basename($_); print "\n\nOriginal : $_"; print "\nDirectory : $dir"; print "\nFielname : $fname\n"; } __END__ Original : Framework/Templates_Doc.Conventions/CSSP_TPL_Excel_Meta-Te +mplate.xlt Directory : Framework/Templates_Doc.Conventions Fielname : CSSP_TPL_Excel_Meta-Template.xlt Original : Engineering/SBI/PD_PRC/Overall_SWReleaseFlow_v0.1.vsd Directory : Engineering/SBI/PD_PRC Fielname : Overall_SWReleaseFlow_v0.1.vsd

    Updated : Using Regular Expression, Change the above for loop by the following:

    for (@f_path){ my ($dir, $fname)= $_ =~ /^(.*?)[\\\/]([^\\\/]+)$/; print "\n\nOriginal : $_"; print "\nDirectory : $dir"; print "\nFielname : $fname\n"; }

    Regards,
    Velusamy R.

Re: regular expression on filenames with absolute path
by sh1tn (Priest) on Nov 05, 2005 at 11:38 UTC
    You should use the core module File::Basename and not regular expressions.
    Don't reinvent the wheel.


      Hello Perl-ers,

      Am back. I was looking at CPAN for modules to do recursive copying and hit upon one module.

      Is there a way to do this recursive copying using nay of the standard modules ?

      Happy X-mas wishes to all of you,

      thanks in advance Mp> Mars

Re: regular expression on filenames with absolute path
by ioannis (Abbot) on Nov 05, 2005 at 10:32 UTC
    /\/[-.\w]+(.txt|.doc|.pdf|.vsd|.xls|.xlt|.dot|.pot|.ppt|.mpp)$/
    Now they will match. Remember, the character class \w means a-zA-Z0-9_, which does not include ., and -.

    This assumes that the general logic is correct. Here, we only got the strings to match, in the spirit of the original regex.

Re: regular expression on filenames with absolute path
by TedPride (Priest) on Nov 05, 2005 at 10:49 UTC
    use strict; use warnings; my ($base, $file); while (<DATA>) { chomp; ($base, $file) = m/(.*\/)([^\/]*)/; print "Orig : $_\n" . "Base : $base\n" . "File : $file\n\n"; } __DATA__ Framework/Templates_Doc.Conventions/CSSP_TPL_Process_Procedure_Descrip +tion.xls Framework/Templates_Doc.Conventions, , /CSSP_TPL_Training_Material.pot Framework/Templates_Doc.Conventions/CSSP_TPL_Excel_Meta-Template.xlt Engineering/SBI/PD_PRC/Overall_SWReleaseFlow_v0.1.vsd

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://505958]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-25 22:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found