pattern matching

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Peace be unto you, O enlightened ones

I have a novice question concerning the extraction of information from a large text file.

I want to look up a line that starts with '0010 0010', and extract the text on the same line that comes after the second double forward slash and before the end of the line, and assign this to a string.

For example, if a line contains:

0010 0010 (text) // (text) // JOHN DOE

I would like to end up with a string that has the value "JOHN DOE"

This command seems to perform the extraction correctly from the command line:

perl -lne 'print if s/0010 0010.*\/\/.*\/\///'

but I can't figure out a way of doing this from within a script, which is what I need.
I have tried various pattern matching things such as $name=~s/^(pattern)$//mgi, but nothing seems to work.
The solution is probably pretty obvious to anyone with a little experience, but I have none.

Thanks in advance

Comment on pattern matching

Replies are listed 'Best First'.
Re: pattern matching by Corion (Patriarch) on May 04, 2004 at 10:59 UTC
A nifty trick when your one-liners get out of hand is to use B::Deparse on it: `perl -MO=Deparse -lne 'print if s/0010 0010.\/\/.\/\///'` [download] which gives you an approximation of what Perl runs: `LINE: while (defined($_ = <ARGV>)) { chomp $_; print $_ if s[0010 0010.//.//][]; }` [download] This is some ugly code, so I'd rewrite that to be more sightly: `#!/usr/perl -w use strict; while (defined(my $line = <ARGV>)) { chomp $line; print $line if s[0010 0010.//.//][]; }` [download] or, if you're interested in post-processing the lines in your script: `#!/usr/perl -w use strict; my @interesting_lines = map { s[0010 0010.//.//][] ? $_ : ()} (<ARGV>); print "Name: $_" for @interesting_lines;` [download] Update: Corrected last snippet to work like the previous snippets	[reply] [d/l] [select]
Re: Re: pattern matching by perlcgi (Hermit) on May 04, 2004 at 13:59 UTC
Be careful using .* as it is one greedy mother. The first .* will swallow everything up to the second last // in your regex. So if your source text ever happens to have three or more sets of // on a line you regex will fail. Better to use nongreedy, minimal matching using .? This is probably so obvious to an expert like Corion, he didn't bother mentioning it. I mention it 'cos the OP states that the regex seems* to work.	[reply]
Re: pattern matching by perlinux (Deacon) on May 04, 2004 at 11:36 UTC
Try this simple script for your line: `#!/usr/bin/perl -w use strict; my $line = "0010 0010 text // text // JOHN DOE"; chomp $line; if ($line =~ /^(0010 ){2}(.\/\/ ){2}(.)$/) { my $name = $3; }` [download] UPDATE: Also: `my $name = "0010 0010 text // text // JOHN DOE"; chomp($name); + $name =~ s/^(0010 ){2}(.\/\/ ){2}(.)$/$3/i; print $name;` [download]	[reply] [d/l] [select]
Re: Re: pattern matching by Anomynous Monk (Scribe) on May 04, 2004 at 18:18 UTC
For the curious, perl reoptimizes (0010 ){2} back into "0010 0010 " when looking for possible places to match (even without the ^).	[reply]
Re: pattern matching by delirium (Chaplain) on May 04, 2004 at 12:33 UTC
I'm making a wild assumption about your data, namely that if a line starts with 0010 0010, it will always be in the format you are describing, and that it is safe to not explicitly search for both // sets. `while (<>) { if (/^0010 0010/) { print $1 if m!([^/]+$)!; } }` [download]	[reply] [d/l]