This regular expression has me stumped

tsk1979 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: This regular expression has me stumped by tachyon-II (Chaplain) on May 01, 2008 at 08:42 UTC
Not every problem is best solved with a big fat regex. `while (my $line = <FILE>) { my @files = map { m!/([\w\.\-]+)\W$!; $1 } grep { m!/! } split ' ', $line; # blah }` [download] The logic goes split on whitespace, ignore all tokens that don't have a file path sep / with , then get the last bit after the / up to the end or optional \W using map. The character class [\w\.\-] should match most filenames. Normally I would use [^/] but this is problematic in this case. Should work on your data as described.	[reply] [d/l]
Re^2: This regular expression has me stumped by tsk1979 (Scribe) on May 01, 2008 at 09:08 UTC
I was hung up on regexp because I want this via a command line `perl -nei.bak.....` I checked my log files /blah/blah/blah/filename can be follows by a whitespace, a "@' q "," or a ":" I have searched for perl non greedy and I suspect `/.*?[@:,\s+]/` [download] will actually match the whole <code>/blah/blah/blah/filename.ext>/code> the problem here is, how to retain the filename...?	[reply] [d/l] [select]
Re^3: This regular expression has me stumped by tachyon-II (Chaplain) on May 01, 2008 at 09:52 UTC
You almost never want .* A negated character class is generally better. For example m!/[(^/)]+$! will grab the last bit of the filepath reliably but the regex posted above in the map should DWIM You could certainly code the example above as a one liner but it seems a waste of time to me. You can make a reusable 4 line script in less time than it will take fiddling. You can put options like -p -F -n on the shebang. As a one liner it would be like: `perl -F -ane 'print map{"$_\n"} map{ } grep { } @F' <file>` [download] where the map and grep blocks are as above.	[reply] [d/l]
Re^3: This regular expression has me stumped by goibhniu (Hermit) on May 01, 2008 at 15:24 UTC
Picking up with the theme you were following, I got this to work. I haven't thought alot about corner cases, performance or reusability, so Grandfather's and tachyon-II's solutions are probably better. *update:* apparently I'm just confused on this matter && added comment on second s/// with no effect: I didn't like doing the substitution twice just to get the end-of-line anchor to work. Perhaps some wiser monks can explain that to me. *update:* That was before I added `chomp`, so never mind . . . #/usr/bin/perl -W $\="\n"; use strict; use warnings; while (<DATA>) { chomp; print $_; s/\/(?:[^\@:,\s+]\/)(.?)[\@:,\s+]/\/new\/path\/$1/g; #s/\/(?:[^\@:,\s+]\/)(.?)[\@:,\s+]$/\/new\/path\/$1/g; print $_; print ''; } # produces: # C:\chas_sandbox> # 683879resp.pl # file /user/name/some/path/to/filename@@ dumped: replaced /user/name/ +blah/blah/filename # file /new/path/filename@@ dumped: replaced /new/path/filename # # @@@@user/some/file/filename.sdc: dumped # @@@@user/new/path/filename.sdc: dumped __DATA__ file /user/name/some/path/to/filename@@ dumped: replaced /user/name/bl +ah/blah/filename @@@@user/some/file/filename.sdc: dumped [download] #my sig used to say 'I humbly seek wisdom. '. Now it says: use strict; use warnings; I humbly seek wisdom.	[reply] [d/l] [select]
A no-op in this map block: was Re^2: This regular expression has me stumped by Narveson (Chaplain) on May 01, 2008 at 22:59 UTC
No need for the`; $1` in `map { m!/([\w\.\-]+)\W*$!; $1 }` since a match in list context returns the captured substrings and the block of a `map` is in list context.	[reply] [d/l] [select]
Re: A no-op in this map block: was Re^2: This regular expression has me stumped by tachyon-II (Chaplain) on May 02, 2008 at 09:17 UTC
Good point. It was a rather off the cuff untested solution....	[reply]
Re: This regular expression has me stumped by GrandFather (Saint) on May 01, 2008 at 09:31 UTC
For this task a little looking around helps as does knowing what not to find, oh, and taking care of lose ends helps too. Consider: `use strict; use warnings; my @tests = ( "First: /home/user/blah/filename and /home/user/blah/filename2 end +", "/home/user/blah/filename,/home/user/blah/filename2", "/home/user/blah/filename; /home/user/blah/filename2", "/home/user/blah/filename\@10:30 /home/user/blah/filename2", ); for my $str (@tests) { $str =~ s!(?:^\|/)[^\s,@;]*(?<=/)([^\s,@;]+?)(?=[\s,@;]\|$)!$1!g; print "$str\n"; }` [download] Prints: `First: filename and filename2 end filename,filename2 filename; filename2 filename@10:30 filename2` [download] Perl is environmentally friendly - it saves trees	[reply] [d/l] [select]
Re^2: This regular expression has me stumped by tsk1979 (Scribe) on May 01, 2008 at 10:18 UTC
Hmm your solution looks like its working! Great. Now the big problem. I cannot make a head or tail of the regexp :( could you explain me a little bit on what exactly happened up there. It made a whooshing sound and flew right by :)	[reply]
Re^3: This regular expression has me stumped by GrandFather (Saint) on May 01, 2008 at 12:10 UTC
:-D Ok, let's take it a a little at a time: `s!` you know, although it's possible you didn't know you can use pretty much any character for the expression delimiters. `(?:^\|/)` matches (without capturing) either the start of the string or a /. `[^\s,@;]*` matches as many characters that aren't in the set of terminal characters as can be found. `(?<=/)` looks back and asserts the last character matched was /. `([^\s,@;]+?)` matches and captures as few non-terminal characters as it can and still find a match. That's the filename that you want. `(?=[\s,@;]\|$)` looks ahead and asserts that the next character is a terminal character or the end of the string. `!$1!g` you are probably completely familiar with - replace all the matched stuff with the captured string and do it for every match that can be found. So with a little head scratching the introductory line of my initial reply might make a more sense along with the regex. For further study consult perlretut, perlre and perlreref. Perl is environmentally friendly - it saves trees	[reply] [d/l] [select]
Re^3: This regular expression has me stumped by toolic (Bishop) on May 01, 2008 at 13:23 UTC
To supplement GrandFather's excellent explanation, here is the output generated by YAPE::Regex::Explain. `use warnings; use strict; use YAPE::Regex::Explain; my $re = 's!(?:^\|/)[^\s,@;]*(?<=/)([^\s,@;]+?)(?=[\s,@;]\|$)!$1!g'; my $parser = YAPE::Regex::Explain->new($re); print $parser->explain;` [download] Read more... (4 kB)	[reply] [d/l] [select]
Re^2: This regular expression has me stumped by tsk1979 (Scribe) on May 02, 2008 at 05:57 UTC
I found a corner case.... :) how about ../filename or ../some/path/filename or ../../some/path/filename	[reply]
Re^3: This regular expression has me stumped by tsk1979 (Scribe) on May 02, 2008 at 06:00 UTC
Another one /some/silly/path/here/../../another/silly/path/filename	[reply]
Okay, I know why is it failing by tsk1979 (Scribe) on May 02, 2008 at 06:16 UTC
this can work right ? /fjsdklf/fjsldkfs/fsjdklf-fs-0-fsf/../fjskfjs/.. +/../../fsfkslf/filename ../../../../filename ../hello dofghello/two/forut/../filename2 Will this work ../../../../jfsdfjskdlfjs/../fjsklf/fjksfjskflsd/filena +me I will do replacement for ../filename this can work right /fjsdklf/fjsldkfs/fsjdklf-fs-0-fsf/../fjskfjs/../ +../../fsfkslf/filename I will think of even/more/silly/../../harder/cases/../analysis/filenam +e and do it ../twice as well as put/some/path/and/make/it/thrice [download] We always assume that the whole path starts with / But the path can be some/path/to/filename also! In that case this will definitely fail. I am scratching my head as to what kind of check to put in for that. Helllp!! :)	[reply] [d/l]
Re: Okay, I know why is it failing by tsk1979 (Scribe) on May 02, 2008 at 06:42 UTC
Solved! `use strict; use warnings; my $file; foreach $file (@ARGV) { open (INFILE,"<$file") or die "Cannot open Input file\n"; while (<INFILE>) { s!(?:^\|\w/\|\.\./)[^\s,@;:](?<=/)([^\s,@;:]+?)(?=[\s,@;:]\|$)! +$1!g; # s!\.\.!!g; print "$_"; } close INFILE; }` [download]	[reply] [d/l]
Re^2: Okay, I know why is it failing by tachyon-II (Chaplain) on May 02, 2008 at 09:34 UTC
Re^3: Okay, I know why is it failing by tsk1979 (Scribe) on May 03, 2008 at 09:17 UTC
Some notes below your chosen depth have not been shown here


There's more than one way to do things
	PerlMonks