Re: This regular expression has me stumped
by tachyon-II (Chaplain) on May 01, 2008 at 08:42 UTC
|
while (my $line = <FILE>) {
my @files = map { m!/([\w\.\-]+)\W*$!; $1 }
grep { m!/! } split ' ', $line;
# blah
}
The logic goes split on whitespace, ignore all tokens that don't have a file path sep / with , then get the last bit after the / up to the end or optional \W* using map. The character class [\w\.\-] should match most filenames. Normally I would use [^/] but this is problematic in this case. Should work on your data as described. | [reply] [d/l] |
|
I was hung up on regexp because I want this via a command line
perl -nei.bak.....
I checked my log files
/blah/blah/blah/filename can be follows by a whitespace, a "@' q "," or a ":"
I have searched for perl non greedy and I suspect
/.*?[@:,\s+]/
will actually match the whole <code>/blah/blah/blah/filename.ext>/code>
the problem here is, how to retain the filename...?
| [reply] [d/l] [select] |
|
You almost never want .* A negated character class is generally better. For example m!/[(^/)]+$! will grab the last bit of the filepath reliably but the regex posted above in the map should DWIM
You could certainly code the example above as a one liner but it seems a waste of time to me. You can make a reusable 4 line script in less time than it will take fiddling. You can put options like -p -F -n on the shebang. As a one liner it would be like:
perl -F -ane 'print map{"$_\n"} map{ } grep { } @F' <file>
where the map and grep blocks are as above. | [reply] [d/l] |
|
Picking up with the theme you were following, I got this to work. I haven't thought alot about corner cases, performance or reusability, so Grandfather's and tachyon-II's solutions are probably better.
update: apparently I'm just confused on this matter && added comment on second s/// with no effect: I didn't like doing the substitution twice just to get the end-of-line anchor to work. Perhaps some wiser monks can explain that to me. update: That was before I added chomp, so never mind . . .
#/usr/bin/perl -W
$\="\n";
use strict;
use warnings;
while (<DATA>) {
chomp;
print $_;
s/\/(?:[^\@:,\s+]*\/)(.*?)[\@:,\s+]*/\/new\/path\/$1/g;
#s/\/(?:[^\@:,\s+]*\/)(.*?)[\@:,\s+]*$/\/new\/path\/$1/g;
print $_;
print '';
}
# produces:
# C:\chas_sandbox>
# 683879resp.pl
# file /user/name/some/path/to/filename@@ dumped: replaced /user/name/
+blah/blah/filename
# file /new/path/filename@@ dumped: replaced /new/path/filename
#
# @@@@user/some/file/filename.sdc: dumped
# @@@@user/new/path/filename.sdc: dumped
__DATA__
file /user/name/some/path/to/filename@@ dumped: replaced /user/name/bl
+ah/blah/filename
@@@@user/some/file/filename.sdc: dumped
#my sig used to say 'I humbly seek wisdom. '. Now it says:
use strict;
use warnings;
I humbly seek wisdom.
| [reply] [d/l] [select] |
|
| [reply] [d/l] [select] |
|
| [reply] |
Re: This regular expression has me stumped
by GrandFather (Saint) on May 01, 2008 at 09:31 UTC
|
For this task a little looking around helps as does knowing what not to find, oh, and taking care of lose ends helps too. Consider:
use strict;
use warnings;
my @tests = (
"First: /home/user/blah/filename and /home/user/blah/filename2 end
+",
"/home/user/blah/filename,/home/user/blah/filename2",
"/home/user/blah/filename; /home/user/blah/filename2",
"/home/user/blah/filename\@10:30 /home/user/blah/filename2",
);
for my $str (@tests) {
$str =~ s!(?:^|/)[^\s,@;]*(?<=/)([^\s,@;]+?)(?=[\s,@;]|$)!$1!g;
print "$str\n";
}
Prints:
First: filename and filename2 end
filename,filename2
filename; filename2
filename@10:30 filename2
Perl is environmentally friendly - it saves trees
| [reply] [d/l] [select] |
|
Hmm your solution looks like its working!
Great.
Now the big problem.
I cannot make a head or tail of the regexp :(
could you explain me a little bit on what exactly happened up there.
It made a whooshing sound and flew right by :)
| [reply] |
|
:-D
Ok, let's take it a a little at a time:
s! you know, although it's possible you didn't know you can use pretty much any character for the expression delimiters.
(?:^|/) matches (without capturing) either the start of the string or a /.
[^\s,@;]* matches as many characters that aren't in the set of terminal characters as can be found.
(?<=/) looks back and asserts the last character matched was /.
([^\s,@;]+?) matches and captures as few non-terminal characters as it can and still find a match. That's the filename that you want.
(?=[\s,@;]|$) looks ahead and asserts that the next character is a terminal character or the end of the string.
!$1!g you are probably completely familiar with - replace all the matched stuff with the captured string and do it for every match that can be found.
So with a little head scratching the introductory line of my initial reply might make a more sense along with the regex. For further study consult perlretut, perlre and perlreref.
Perl is environmentally friendly - it saves trees
| [reply] [d/l] [select] |
|
use warnings;
use strict;
use YAPE::Regex::Explain;
my $re = 's!(?:^|/)[^\s,@;]*(?<=/)([^\s,@;]+?)(?=[\s,@;]|$)!$1!g';
my $parser = YAPE::Regex::Explain->new($re);
print $parser->explain;
| [reply] [d/l] [select] |
|
I found a corner case.... :)
how about ../filename or ../some/path/filename or ../../some/path/filename
| [reply] |
|
Another one
/some/silly/path/here/../../another/silly/path/filename
| [reply] |
|
this can work right ? /fjsdklf/fjsldkfs/fsjdklf-fs-0-fsf/../fjskfjs/..
+/../../fsfkslf/filename
../../../../filename ../hello dofghello/two/forut/../filename2
Will this work ../../../../jfsdfjskdlfjs/../fjsklf/fjksfjskflsd/filena
+me
I will do replacement for ../filename
this can work right /fjsdklf/fjsldkfs/fsjdklf-fs-0-fsf/../fjskfjs/../
+../../fsfkslf/filename
I will think of even/more/silly/../../harder/cases/../analysis/filenam
+e and do it ../twice as well as put/some/path/and/make/it/thrice
We always assume that the whole path starts with /
But the path can be some/path/to/filename also!
In that case this will definitely fail.
I am scratching my head as to what kind of check to put in for that. Helllp!!
:) | [reply] [d/l] |
|
use strict;
use warnings;
my $file;
foreach $file (@ARGV) {
open (INFILE,"<$file") or die "Cannot open Input file\n";
while (<INFILE>) {
s!(?:^|\w*/|\.\./)[^\s,@;:]*(?<=/)([^\s,@;:]+?)(?=[\s,@;:]|$)!
+$1!g;
# s!\.\.!!g;
print "$_";
}
close INFILE;
}
| [reply] [d/l] |
|
|
|