Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

pattern matching

by Anonymous Monk
on May 04, 2004 at 10:47 UTC ( [id://350284]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Peace be unto you, O enlightened ones

I have a novice question concerning the extraction of information from a large text file.

I want to look up a line that starts with '0010 0010', and extract the text on the same line that comes after the second double forward slash and before the end of the line, and assign this to a string.

For example, if a line contains:

0010 0010 (text) // (text) // JOHN DOE

I would like to end up with a string that has the value "JOHN DOE"

This command seems to perform the extraction correctly from the command line:

perl -lne 'print if s/0010 0010.*\/\/.*\/\///'

but I can't figure out a way of doing this from within a script, which is what I need.
I have tried various pattern matching things such as $name=~s/^(pattern)$//mgi, but nothing seems to work.
The solution is probably pretty obvious to anyone with a little experience, but I have none.

Thanks in advance

S

Replies are listed 'Best First'.
Re: pattern matching
by Corion (Patriarch) on May 04, 2004 at 10:59 UTC

    A nifty trick when your one-liners get out of hand is to use B::Deparse on it:

    perl -MO=Deparse -lne 'print if s/0010 0010.*\/\/.*\/\///'

    which gives you an approximation of what Perl runs:

    LINE: while (defined($_ = <ARGV>)) { chomp $_; print $_ if s[0010 0010.*//.*//][]; }

    This is some ugly code, so I'd rewrite that to be more sightly:

    #!/usr/perl -w use strict; while (defined(my $line = <ARGV>)) { chomp $line; print $line if s[0010 0010.*//.*//][]; }
    or, if you're interested in post-processing the lines in your script:
    #!/usr/perl -w use strict; my @interesting_lines = map { s[0010 0010.*//.*//][] ? $_ : ()} (<ARGV>); print "Name: $_" for @interesting_lines;

    Update: Corrected last snippet to work like the previous snippets

      Be careful using .* as it is one greedy mother. The first .* will swallow everything up to the second last // in your regex.
      So if your source text ever happens to have three or more sets of // on a line you regex will fail. Better to use nongreedy, minimal matching using .*?
      This is probably so obvious to an expert like Corion, he didn't bother mentioning it. I mention it 'cos the OP states that the regex seems to work.
Re: pattern matching
by perlinux (Deacon) on May 04, 2004 at 11:36 UTC
    Try this simple script for your line:
    #!/usr/bin/perl -w use strict; my $line = "0010 0010 text // text // JOHN DOE"; chomp $line; if ($line =~ /^(0010 ){2}(.*\/\/ ){2}(.*)$/) { my $name = $3; }
    UPDATE:
    Also:
    my $name = "0010 0010 text // text // JOHN DOE"; chomp($name); + $name =~ s/^(0010 ){2}(.*\/\/ ){2}(.*)$/$3/i; print $name;
      For the curious, perl reoptimizes (0010 ){2} back into "0010 0010 " when looking for possible places to match (even without the ^).
Re: pattern matching
by delirium (Chaplain) on May 04, 2004 at 12:33 UTC
    I'm making a wild assumption about your data, namely that if a line starts with 0010 0010, it will always be in the format you are describing, and that it is safe to not explicitly search for both // sets.

    while (<>) { if (/^0010 0010/) { print $1 if m!([^/]+$)!; } }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://350284]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (6)
As of 2024-04-18 11:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found