Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Print multiple lines based on condition

by syedasadali95 (Acolyte)
on Mar 12, 2020 at 07:06 UTC ( [id://11114156]=perlquestion: print w/replies, xml ) Need Help??

syedasadali95 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am trying to divide one huge file into two files using perl. My source file is something like this

######## file ########

1. command:read address:0xA

2. command:write address:0xB

3. writedata:0x12

4. writedata:0x34

5. writedata:0x56

6. writedata:0x78

7. command:read address:0xC

8. command:write address:0xD

9. writedata:0x9A

10. writedata:0xBC

11. writedata:0xDE

12. writedata:0xF0

######## file ########

Taking the above file as input I want to split the content into two different files based on address[0]=0/1. If address[0]=1, I want to print that line to one.txt else print that line to zero.txt no matter whether it is write/read. There is no problem to handle lines of read as it has only one line but the issue here is every write command is followed by next 4 lines of write data. How to print 4 write data lines followed by every write command line to both the output files.

######## code starts ######## while (my $line1 = <IN_FILE>) { if( ($line1 =~ /address/) ) { $line1 =~ /address:([a-z0-9-]+)\s+/; my $address = hex $1; $grep = substr( (sprintf "%b", $address), -1, 1); if($grep) { printf OUT_FILE1 ("$line1"); } else { printf OUT_FILE2 ("$line1"); } } } ######## code ends ########

I am struggling here for printing 4 write data lines because my while loop iterates per line and if I am using any loop inside my while loop to print the $line1 variable 4 times then it's printing the same line for 4 times

Replies are listed 'Best First'.
Re: Print multiple lines based on condition (updated)
by haukex (Archbishop) on Mar 12, 2020 at 09:13 UTC
    every write command is followed by next 4 lines of write data

    If you're certain that it's always exactly four lines, and if you are certain the code in your while loop won't grow any larger than this, then I think reading from the file in two places inside the same piece of code may be an acceptable solution, as in my @lines; push @lines, scalar <IN_FILE> for 1..4; (BTW, note "open" Best Practices). However, if any of the aforementioned conditions aren't true, then I would very strongly recommend a state machine instead.

    Update: This also means that the input data you showed isn't actually representative of your actual input data. Please provide some representative sample input, in <code> tags. See also How do I post a question effectively?, I know what I mean. Why don't you?, and Short, Self-Contained, Correct Example.

Re: Print multiple lines based on condition
by AnomalousMonk (Archbishop) on Mar 12, 2020 at 09:30 UTC

    I don't really understand your requirements, but here's something that might serve to begin to narrow the range of possibilities. Note that I have used your example input file as is, including blank lines.

    Source write_to_2_files_1.pl:

    use strict; use warnings; use constant USAGE => <<"EOT"; usage: perl $0 file_in file_zero file_one where: file_in input file name file_zero output file name - address[0] bit == 0 file_one output file name - address[0] bit == 1 EOT die USAGE if @ARGV != 3; my ($file_in, $file_zero, $file_one) = @ARGV; open my $fh_in, '<', $file_in or die "opening '$file_in': $!"; open my $fh_0, '>', $file_zero or die "opening '$file_zero': $!"; open my $fh_1, '>', $file_one or die "opening '$file_one': $!"; my $fh_current; LINE: while (my $line = <$fh_in>) { next LINE unless $line =~ m{ \S }xms; # ignore blank lines my $got_command_addr = my ($rw_hex_addr) = $line =~ m{ \A \d+ [.] \s+ command: (?: read | write) \s+ address:0x ([[:xdigit:]]+) \s* \Z }xms; if ($got_command_addr) { $fh_current = 0x1 & hex $rw_hex_addr ? $fh_1 : $fh_0; } die "no command read/write address seen" unless $fh_current; print $fh_current $line; } close $fh_in or die "closing '$file_in': $!"; close $fh_0 or die "closing '$file_zero': $!"; close $fh_1 or die "closing '$file_one': $!"; exit; # subroutines ###################################################### # none for now
    Invocation:
    c:\@Work\Perl\monks\syedasadali95>perl write_to_2_files_1.pl file_in.t +xt zero.txt one.txt
    Output zero.txt:
    1. command:read address:0xA 7. command:read address:0xC
    Output one.txt:
    2. command:write address:0xB 3. writedata:0x12 4. writedata:0x34 5. writedata:0x56 6. writedata:0x78 8. command:write address:0xD 9. writedata:0x9A 10. writedata:0xBC 11. writedata:0xDE 12. writedata:0xF0

    Update: Please note that it might have been helpful if you had provided expected output files for the OPed example input file. Also, please post input/output files and data, command lines and error messages as well as code within  <code> ... </code> tags. (Update: Please also see the update of this post by haukex which already touched on these points and included several very informative links.)


    Give a man a fish:  <%-{-{-{-<

      Sorry for not being clear with my question. My mistake. Here is the sample of input data I am operating on:

      ######################### Input file #############################

      chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca01 addr:0x10000a qospri:0 len:0xf noalloc:0

      chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca0d addr:0x20000b qospri:0 len:0xf noalloc:0

      chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca19 addr:0x30000c qospri:0 len:0xf noalloc:0

      chn:req cmd:SDP_CMD_WRSIZEDFULL tag:0x3b9acac1 addr:0xc10000a qospri:0 len:0xf noalloc:0

      chn:odat tag:0x3b9acac1 dat:0x3f80 be:0xffffffff

      chn:odat tag:0x3b9acac1 dat:0x3f81 be:0xffffffff

      chn:odat tag:0x3b9acac1 dat:0x3f82 be:0xffffffff

      chn:odat tag:0x3b9acac1 dat:0x3f83 be:0xffffffff

      chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca25 addr:0x40000d qospri:0 len:0xf noalloc:0

      chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca31 addr:0x50000e qospri:0 len:0xf noalloc:0

      chn:req cmd:SDP_CMD_WRSIZEDFULL tag:0x3b9acacd addr:0xc20000b qospri:0 len:0xf noalloc:0

      chn:odat tag:0x3b9acacd dat:0x4f83 be:0xffffffff

      chn:odat tag:0x3b9acacd dat:0x9f85 be:0xffffffff

      chn:odat tag:0x3b9acacd dat:0x7f88 be:0xffffffff

      chn:odat tag:0x3b9acacd dat:0x5f87 be:0xffffffff

      Note: Please consider that above file doesn't have any blank lines and also not required in the output files.

      ########################### Output file1 ##########################

      Required output one.txt

      chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca0d addr:0x20000b qospri:0 len:0xf noalloc:0

      chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca25 addr:0x40000d qospri:0 len:0xf noalloc:0

      chn:req cmd:SDP_CMD_WRSIZEDFULL tag:0x3b9acacd addr:0xc20000b qospri:0 len:0xf noalloc:0

      chn:odat tag:0x3b9acacd dat:0x4f83 be:0xffffffff

      chn:odat tag:0x3b9acacd dat:0x9f85 be:0xffffffff

      chn:odat tag:0x3b9acacd dat:0x7f88 be:0xffffffff

      chn:odat tag:0x3b9acacd dat:0x5f87 be:0xffffffff

      ############################ Output file2 ##########################

      Required output zero.txt

      chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca01 addr:0x10000a qospri:0 len:0xf noalloc:0

      chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca19 addr:0x30000c qospri:0 len:0xf noalloc:0

      chn:req cmd:SDP_CMD_WRSIZEDFULL tag:0x3b9acac1 addr:0xc10000a qospri:0 len:0xf noalloc:0

      chn:odat tag:0x3b9acac1 dat:0x3f80 be:0xffffffff

      chn:odat tag:0x3b9acac1 dat:0x3f81 be:0xffffffff

      chn:odat tag:0x3b9acac1 dat:0x3f82 be:0xffffffff

      chn:odat tag:0x3b9acac1 dat:0x3f83 be:0xffffffff

      chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca31 addr:0x50000e qospri:0 len:0xf noalloc:0

      ######## Code starts ########

      while (my $line1 = <IN_FILE>) { if( ($line1 =~ /addr/) ) { $line1 =~ /addr:([a-z0-9-]+)\s+/; my $address = hex $1; if($address & 1 ) { printf OUT_FILE1 ("$line1"); } else { printf OUT_FILE2 ("$line1"); } } }

      ######## Code ends ########

      I was thinking to use the tag field to print the 4 write data lines just after the write command line since the tag field for write command and write data is unique and constant for every write transaction. RDBLKL = read, WRSIZEDFULL = write. Please ignore the blank lines in the input and output files I provided. I am no able to go to next line with out closing a paragraph in the writeup. Thanks in advance for the help.

        I am no able to go to next line with out closing a paragraph in the writeup.

        You are already using <code> tags for your code - use them also for your data. This is the second tip in Writeup Formatting Tips.

        Try this. Note that:

        • I'm using your example input and output files with blank lines already removed, but I think the code should work with blank lines in an input file (untested);
        • I'm comparing the generated output files to your example output files with the Windows fc (file compare) utility and there are no differences;
        • I'm using autodie to reduce the  ... or die ... noise, but this module only became part of core with Perl version 5.10, so you may have to put the noise back in;
        • This script is invoked in the same way as the previous one;
        • Tested under Perl version 5.8.9.


        Give a man a fish:  <%-{-{-{-<

Re: Print multiple lines based on condition
by kcott (Archbishop) on Mar 13, 2020 at 04:54 UTC

    G'day syedasadali95,

    Based on your description here, and your updated data, here's pm_11114156_record_collate.pl:

    #!/usr/bin/env perl use strict; use warnings; use autodie; my ($file_in, $file_out_0, $file_out_1) = qw{ pm_11114156_record_collate_in.txt pm_11114156_record_collate_out_0.txt pm_11114156_record_collate_out_1.txt }; my (@out_fhs, $out_fh); my $re = qr{\saddr:(0x[0-9a-f]+)\s}; open my $in_fh, '<', $file_in; open $out_fhs[0], '>', $file_out_0; open $out_fhs[1], '>', $file_out_1; while (<$in_fh>) { if (/$re/) { $out_fh = $out_fhs[hex $1 & 1]; } print $out_fh $_; }

    And here's a sample run:

    ken@titan ~/tmp $ cat pm_11114156_record_collate_out_0.txt cat: pm_11114156_record_collate_out_0.txt: No such file or directory ken@titan ~/tmp $ cat pm_11114156_record_collate_out_1.txt cat: pm_11114156_record_collate_out_1.txt: No such file or directory ken@titan ~/tmp $ cat pm_11114156_record_collate_in.txt chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca01 addr:0x10000a qospri:0 len:0 +xf noalloc:0 chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca0d addr:0x20000b qospri:0 len:0 +xf noalloc:0 chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca19 addr:0x30000c qospri:0 len:0 +xf noalloc:0 chn:req cmd:SDP_CMD_WRSIZEDFULL tag:0x3b9acac1 addr:0xc10000a qospri:0 + len:0xf noalloc:0 chn:odat tag:0x3b9acac1 dat:0x3f80 be:0xffffffff chn:odat tag:0x3b9acac1 dat:0x3f81 be:0xffffffff chn:odat tag:0x3b9acac1 dat:0x3f82 be:0xffffffff chn:odat tag:0x3b9acac1 dat:0x3f83 be:0xffffffff chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca25 addr:0x40000d qospri:0 len:0 +xf noalloc:0 chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca31 addr:0x50000e qospri:0 len:0 +xf noalloc:0 chn:req cmd:SDP_CMD_WRSIZEDFULL tag:0x3b9acacd addr:0xc20000b qospri:0 + len:0xf noalloc:0 chn:odat tag:0x3b9acacd dat:0x4f83 be:0xffffffff chn:odat tag:0x3b9acacd dat:0x9f85 be:0xffffffff chn:odat tag:0x3b9acacd dat:0x7f88 be:0xffffffff chn:odat tag:0x3b9acacd dat:0x5f87 be:0xffffffff ken@titan ~/tmp $ ./pm_11114156_record_collate.pl ken@titan ~/tmp $ cat pm_11114156_record_collate_out_0.txt chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca01 addr:0x10000a qospri:0 len:0 +xf noalloc:0 chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca19 addr:0x30000c qospri:0 len:0 +xf noalloc:0 chn:req cmd:SDP_CMD_WRSIZEDFULL tag:0x3b9acac1 addr:0xc10000a qospri:0 + len:0xf noalloc:0 chn:odat tag:0x3b9acac1 dat:0x3f80 be:0xffffffff chn:odat tag:0x3b9acac1 dat:0x3f81 be:0xffffffff chn:odat tag:0x3b9acac1 dat:0x3f82 be:0xffffffff chn:odat tag:0x3b9acac1 dat:0x3f83 be:0xffffffff chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca31 addr:0x50000e qospri:0 len:0 +xf noalloc:0 ken@titan ~/tmp $ cat pm_11114156_record_collate_out_1.txt chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca0d addr:0x20000b qospri:0 len:0 +xf noalloc:0 chn:req cmd:SDP_CMD_RDBLKL tag:0x3b9aca25 addr:0x40000d qospri:0 len:0 +xf noalloc:0 chn:req cmd:SDP_CMD_WRSIZEDFULL tag:0x3b9acacd addr:0xc20000b qospri:0 + len:0xf noalloc:0 chn:odat tag:0x3b9acacd dat:0x4f83 be:0xffffffff chn:odat tag:0x3b9acacd dat:0x9f85 be:0xffffffff chn:odat tag:0x3b9acacd dat:0x7f88 be:0xffffffff chn:odat tag:0x3b9acacd dat:0x5f87 be:0xffffffff ken@titan ~/tmp $

    — Ken

Re: Print multiple lines based on condition
by GrandFather (Saint) on Mar 12, 2020 at 07:25 UTC

    Perhaps you should tell us something about the bigger picture? You have asked a series of closely related questions about details of what you are experimenting with. We may be able to help much more if you fill in some details so we know what your overall goal is.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

      My apologies for extending the question. I could've asked this with my previous question itself but I missed it. My overall goal is the combination of the two questions which I've asked recently. Nothing more than that.

      Thanks

Re: Print multiple lines based on condition
by jcb (Parson) on Mar 12, 2020 at 22:51 UTC

    You are very close: you need to move the my $address out of the loop so the value is remembered across iterations.

    Example: (untested)

    my $address; while (<IN_FILE>) { $address = hex $1 if m/address:([-[:xdigit:]]+)\s+/; next unless defined $address; if ($address & 1) { print OUT_FILE1 $_ } else { print OUT_FILE2 $_ } }

    This is a very basic finite state machine. The $address lexical holds state (the most recent address seen) across loop iterations. There are a few other changes as well. First, the builtin $_ is used instead of a lexical to hold the input line; since pattern matches also default to using this variable, some clutter can be removed. Second, the address value capture is combined with the test for its presence and the statement modifier form is used. Third, no output is produced until an address has been seen (the next unless defined $address provides this feature). Lastly, the address is being stored as an integer, so an integer bit test is used instead of converting it back to a string.

      Thanks everyone for your support and suggestions. Now I know that there are multiple ways of doing this task. It seems to me that the simplest way to do it is by using state machine. Thanks again guys!

Re: Print multiple lines based on condition
by dasgar (Priest) on Mar 12, 2020 at 20:36 UTC

    What came to my mind was to use Tie::File, which lets you "access the lines of a disk file via a Perl array" without reading the whole file into memory. The code below is incomplete and untested, but it shows the logic of how I would approach this using Tie::File.

Re: Print multiple lines based on condition
by BillKSmith (Monsignor) on Mar 13, 2020 at 15:04 UTC
    This code is a variation of the state-machine concept. The variable '$out' is a reference to the file handle of the current output file. When a 'command' line is discovered, the last hex digit of the address field is extracted. The "state" is set to one reference or the other depending on whether or not it is an even digit.
    my $out; while (<INFILE>) { if(/addr:0x[0-9a-f]+([0-9a-f])/) { $out = ($1 =~ /[02468ace]/)? $OUT_FILE0 : $OUT_FILE1; } print {$out} $_; }
    Bill

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11114156]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-04-25 15:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found