Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

perl regex for repeating words

by analys (Initiate)
on Mar 11, 2018 at 16:03 UTC ( [id://1210664]=perlquestion: print w/replies, xml ) Need Help??

analys has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I was trying to extract the following words (b and f sentence - b should only be from f sentence ) from input file. But I'm not sure what's the correct regex for the following codes.Sorry about that, I have edited the output description.

My code :

open (INPUT, "<", $ARGV[0]) || die "\nError: File Read \"$ARGV[0]\"\nE +rror: $!\n\n"; while ( $line = <INPUT> ) { if ( $line =~ /^(b|f)/) { print $line; } } close INPUT;

Input file : (to extract for b and f)

b=23 y=0x11 arg=0x70 def=0x1 val=0x234 checking system b=71 y=0x35 arg=0x87 def=0x3 val=0x76d h=reg.k2.io.chk 0x2001 b=54 y=0x23 arg=0x78 def=0x2 val=0x65b f chk.fin.reg.m_cwr 0x213 b=54 y=0x23 arg=0x78 def=0x2 val=0x65b checking system b=40 y=0x90 arg=0x34 def=0x5 val=0x2197 f ref.grf.pin.clk_trg 0x0021

Expected output:

b=54 y=0x23 arg=0x78 def=0x2 val=0x65b f chk.fin.reg.m_cwr 0x213 b=40 y=0x90 arg=0x34 def=0x5 val=0x2197 f ref.grf.pin.clk_trg 0x0021 The b must be from f From the b and f I want to get the from b (y, arg and def) and f (\w+ - chk.fin.reg.m_cwr /ref.grf.pin.clk_trg )

2018-03-12 Athanasius added code and paragraph tags

Replies are listed 'Best First'.
Re: perl regex for repeating words
by Perlbotics (Archbishop) on Mar 11, 2018 at 18:36 UTC

    I was trying to extract the following words from input file.
    But what words? Just guessing:

    use strict; use warnings; while ( my $line = <DATA> ) { #-- would print lines starting with either 'f' or 'b': # if ( $line =~ /^(b|f)/) { # print $line; # } #-- not elegant, like the spec... if ( $line =~ /^(b)=(\S+)/ or $line =~ /^(f)\s+(\S+)\s+(\S+)$/ ) { print "$1 = '$2' (", $3 // '-' , ")\n"; } } __DATA__ Input file : (to extract for b and f) b=23 y=0x11 arg=0x70 def=0x1 val=0x234 checking system b=71 y=0x35 arg=0x87 def=0x3 val=0x76d h=reg.k2.io.chk 0x2001 b=54 y=0x23 arg=0x78 def=0x2 val=0x65b f chk.fin.reg.m_cwr 0x213 b=54 y=0x23 arg=0x78 def=0x2 val=0x65b checking system b=40 y=0x90 arg=0x34 def=0x5 val=0x2197 f ref.grf.pin.clk_trg 0x0021

    Result:

    b = '23' (-) b = '71' (-) b = '54' (-) f = 'chk.fin.reg.m_cwr' (0x213) b = '54' (-) b = '40' (-) f = 'ref.grf.pin.clk_trg' (0x0021)

      I will need to get whole sentence of b and f and ignore b that are not related to f.

      Expected output :

      b=54 y=0x23 arg=0x78 def=0x2 val=0x65b f chk.fin.reg.m_cwr 0x213 b=40 y=0x90 arg=0x34 def=0x5 val=0x2197 f ref.grf.pin.clk_trg 0x0021

      from the b sentence, I will need to get y, arg, def. from f sentence (naming - chk.fin.reg.m_cwr, ref.grf.pin.clk_trg)

      Ex: From b : y = 0x23 arg = 0x78 def = 0x2 From f : chk.fin.reg.m_cwr
        You could just modify my previous code to also return the $record from get_record() and print the $record if $f ne '';

        Another common technique for parsing a record like this is to return a hash (or hash ref) in case you want to access these individual parameters. The format X=3 is easy to parse, below I show how, with a more extensive code modification. There are of course other formulations of how to do this. I just went along the lines of my original post.

        The idea of these Monk posts is to give you some ideas and get you "unstuck". Some effort on the OP is required to understand the technique(s) and adapt it to your specific application. Hope this helps.

        Update:
        I thought better of the ending condition for the main loop and changed it to end on a blank record. Also as a general point, I typically try to isolate the record parsing into a subroutine. I just made a new code release of a project I've been working on for the past year. During that time, the input format and also how the data is presented of one key file has changed 4 times! I yell at these guys when they do that, but alas, I must adapt. I derive the same info from all 4 formats, but this is not just different formats, but also different algorithms - "data" is not "information". A mess.

        #!/usr/bin/perl use warnings; use strict; my ($record,%hash); while ( ($record, %hash)=get_record() and $record ne '' ) { next unless $hash{f} ne ''; print "$record\n"; print "Extra parsed info:\n"; foreach my $key (keys %hash) { print " $key \tvalue=$hash{$key}\n"; } print "\n"; } sub get_record { my $line; my $record=''; while (defined($line=<DATA>) and $line !~ /^\s*$/) { $record .= $line } my %hash = $record =~ /(\w+)=(\w+)/g; ($hash{f}) = $record =~ /f (.*)/; $hash{f} //=""; return ($record, %hash); } =Prints: b=54 y=0x23 arg=0x78 def=0x2 val=0x65b f chk.fin.reg.m_cwr 0x213 Extra parsed info: def value=0x2 val value=0x65b f value=chk.fin.reg.m_cwr 0x213 b value=54 y value=0x23 arg value=0x78 b=40 y=0x90 arg=0x34 def=0x5 val=0x2197 f ref.grf.pin.clk_trg 0x0021 Extra parsed info: arg value=0x34 y value=0x90 f value=ref.grf.pin.clk_trg 0x0021 b value=40 def value=0x5 val value=0x2197 =cut __DATA__ b=23 y=0x11 arg=0x70 def=0x1 val=0x234 checking system b=71 y=0x35 arg=0x87 def=0x3 val=0x76d h=reg.k2.io.chk 0x2001 b=54 y=0x23 arg=0x78 def=0x2 val=0x65b f chk.fin.reg.m_cwr 0x213 b=54 y=0x23 arg=0x78 def=0x2 val=0x65b checking system b=40 y=0x90 arg=0x34 def=0x5 val=0x2197 f ref.grf.pin.clk_trg 0x0021
Re: perl regex for repeating words
by Marshall (Canon) on Mar 11, 2018 at 17:50 UTC
    Please use <code></code> tags around your code. I am not sure what is going on each line.
    There are a number of ways to approach a situation like this. Before suggesting one, I'd like a better idea of exactly what you have.
    Basically the easier you make it for the Monks to help you, the more likely you are to get a relevant answer.

    edit: Maybe the data is like this?

    b=23 y=0x11 arg=0x70 def=0x1 val=0x234 checking system b=71 y=0x35 arg=0x87 def=0x3 val=0x76d h=reg.k2.io.chk 0x2001 b=54 y=0x23 arg=0x78 def=0x2 val=0x65b f chk.fin.reg.m_cwr 0x213 b=54 y=0x23 arg=0x78 def=0x2 val=0x65b checking system b=40 y=0x90 arg=0x34 def=0x5 val=0x2197 f ref.grf.pin.clk_trg 0x0021
    I have no idea. Please show what goes on what line. This will make a difference.
    Without that information, we are just guessing.

    ###### update
    Ok, assuming the formatting by Athanasius++ is correct.
    This is a simple albeit very wordy algorithm - make a single line out of the multi-line record and run a regex on that.

    #!/usr/bin/perl use warnings; use strict; my ($b,$f); while ( ($b, $f)=get_record(), defined ($b) ) { print "b=\'$b\' f=\'$f\'\n"; } sub get_record { my $line; my $record=''; while (defined($line=<DATA>) and $line !~ /^\s*$/) { $record .= $line } my ($b)= $record =~ /b\=(\w+)/; my ($f)= $record =~ /f (.*)/; $f//=""; return ($b,$f); } =Prints: b='23' f='' b='71' f='' b='54' f='chk.fin.reg.m_cwr 0x213' b='54' f='' b='40' f='ref.grf.pin.clk_trg 0x0021' =cut __DATA__ b=23 y=0x11 arg=0x70 def=0x1 val=0x234 checking system b=71 y=0x35 arg=0x87 def=0x3 val=0x76d h=reg.k2.io.chk 0x2001 b=54 y=0x23 arg=0x78 def=0x2 val=0x65b f chk.fin.reg.m_cwr 0x213 b=54 y=0x23 arg=0x78 def=0x2 val=0x65b checking system b=40 y=0x90 arg=0x34 def=0x5 val=0x2197 f ref.grf.pin.clk_trg 0x0021
Re: perl regex for repeating words
by AnomalousMonk (Archbishop) on Mar 11, 2018 at 20:15 UTC

    In addition to not clearly showing your code and data, you do not show the output you want to get from the given data, you do not show the output you actually get, and you don't say how what you get differs from what you want. Please see Short, Self-Contained, Correct Example. Please also see How do I change/delete my post?


    Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1210664]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-04-25 23:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found