http://qs321.pair.com?node_id=11106985

kshitij has asked for the wisdom of the Perl Monks concerning the following question:

Hi All ,

I would require your help to generate one output file after post processing of one CSV file as stated below

This file is just a small cut from a big file . Big file is having 20000 lines

PATTERN,pat0,pat1,pat2,pat3,pat4,pat5,pat6,pat7,pat8,pat9 U_TOP_LOGIC/ipre_reg_0/Q,0,0,1,1,0,0,1,1,0,0 U_TOP_LOGIC/ipre_reg_6/Q,1,1,0,0,1,1,0,0,1,1 U_TOP_LOGIC/pre_reck_1/Q,1,1,0,1,1,0,0,1,1,0 U_TOP_LOGIC/pre_reg_10/Q,0,1,0,1,1,0,0,1,1,1 U_TOP_LOGIC/pre_reg_11/Q,0,0,1,0,1,0,0,1,0,1

Now , I need to create one output file in which whenever the transition is happening from "0" to "1" or "1" to "0" , the destination pattern should be provided weight 0.25 in an incremental order and the end I need to sum the weight of each pattern

For example , this is the pattern U_TOP_LOGIC/ipre_reg_0/Q,0,0,1,1,0,0,1,1,0,0

pat0 is "0" , pat1 -> "0" , pat2 > "1" , pat3 -> "1" , pat4 -> "0" , pat5 -> "0" , pat6 -> 1 , pat7 -> 1 , pat8 -> 0 , pat9 -> 0

Now when the transition is happening from "0" to "1" , for example it is happening for pat1 to pat2 pattern , pat2 is assigned "0.5" weight , similarly when the transition is happening from "1" to "0" , for example it is happening from pat3 and pat4 then 0.5 is assigned for pat4 ,

This should be the sequence for U_TOP_LOGIC/ipre_reg_0

pat2 -> 0.5 pat 4 -> 0.5 pat6 -> 0.5 pat8 -> 0.5

Output file is like this for all the patterns

Number pat0 pat1 pat2 pat3 pat4 pat5 pat6 pat7 pat8 +pat9 U_TOP_LOGIC/ipre_reg_0/Q 0 0 0.5 0 0.5 0 0.5 0 0.5 + 0 U_TOP_LOGIC/ipre_reg_6/Q 0 0 0.5 0 0.5 0 0.5 0 0.5 + 0 U_TOP_LOGIC/pre_reck_1/Q 0 0 0.5 0.5 0 0.5 0 0.5 0 + 0.5 U_TOP_LOGIC/pre_reg_10/Q 0 0.5 0.5 0.5 0 0.5 0 0.5 0 + 0 U_TOP_LOGIC/pre_reg_11/Q 0 0 0.5 0.5 0.5 0.5 0 0.5 0.5 + 0.5 ###################################################################### +######## SUM Of weights 0 0.5 2.5 1.5 1.5 1.5 1 1.5 1.5 + 1.5 ###################################################################### +########

Replies are listed 'Best First'.
Re: Perl script for the post processing of one CSV file
by rjt (Curate) on Oct 03, 2019 at 05:52 UTC

    What have you tried already? See How do I post a question effectively?, but you have at least provided some good sample input and output, so that's a good start.

    See Text::CSV for the CSV processing.

    use Text::CSV 'csv'; my $aoh = csv ( in => 'input.csv', headers => 'auto' ); # Fetch the "patN" column names, in order. This works # for single digit "patN" names, as that was your # example. Multi-digit names will require a more complex # sort, left as an exercise to the reader. my @pats = sort grep /^pat\d+$/, keys %{$aoh->[0]}; my %sums; # Column sums. $sum{column} += $value; for my $row ($@aoa) { for (map { $row->{$_} } @pats) { # You are now iterating over every patN value, # in order. Perform your transformation } # Just an example. $sums{$_} += $row->{$_} for @pats; }

    This is just a skeleton to get you started. Certainly more code than you posted. The code will require modification, does not output anything, and some of my simple logic may prove to be inadequate, but hopefully illustrates the general approach you might take. Definitely read the Text::CSV manual thoroughly, and probably perldata as well.

    use strict; use warnings; omitted for brevity.

      You can stream it and prevent memory hogs on big data. Additionally, install Text::CSV_XS for speed

      use Text::CSV_XS "csv"; my %sums; # Column sums. $sum{column} += $value my @head; my @pats; csv (in => "input.csv", out => undef, bom => 1, kh => \@head, on_in => + sub { my ($csv, $row) = @_; unless (@pats) { # Fetch the "patN" column names, in order. This works # for single digit "patN" names, as that was your # example. Multi-digit names will require a more complex # sort, left as an exercise to the reader. # XXX @pats = sort { grep m/^pat\d+$/ } @head; @pats = sort grep m/^pat\d+$/ => @head; # This line fixed } for (@{$row}{@pats}) { # You are now iterating over every patN value, # in order. Perform your transformation } # Just an example. $sums{$_} += $row->{$_} for @pats; });

      update: I changed the grep line which I blindly copied from the original code


      Enjoy, Have FUN! H.Merijn
        You can stream it and prevent memory hogs on big data. Additionally, install Text::CSV_XS for speed use Text::CSV_XS "csv";

        Install Text::CSV_XS, yes, but don't explicitly use Text::CSV_XS. Text::CSV is smart enough to pull in the XS version if installed, and will fall back to pure Perl if not. There is usually no point in having the script break if XS isn't available.

        perl -MText::CSV -e 'print Text::CSV->module' Text::CSV_XS

        I agree with your streaming suggestion. I had opted to keep my example simple given the 20k line input. Good to teach the streaming approach, though. ++

        use strict; use warnings; omitted for brevity.
        Hi Superdoc ,

        Thanks , I have tried it but getting an error as stated below

        Not enough arguments for grep at csv_to_output_report.pl line 18, near "m/^pat\d+$/ }"

        Could you guide me to resolve this error ?

        Thanks Kshitij

        Hi Superdoc ,

        I cant use these subroutines "bom" and "kh" since I am using older version of perl. Can you help me out with the code without using these subroutines ?

        Thanks Kshitij

Re: Perl script for the post processing of one CSV file
by tybalt89 (Monsignor) on Oct 03, 2019 at 14:58 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11106985 use warnings; <DATA>; my @sums = (0) x 10; print "Number", ' ' x 18, map(" pat$_", 0 .. 9), "\n"; while( <DATA> ) { s/(?<=,(0|1),)(0|1)/ ($1 ^ 0 ^ $2) * 0.5 /ge; s/Q,1/Q,0/; my @values = (split /,|\n/)[-10 .. -1]; $sums[$_] += $values[$_] for 0 .. 9; s/Q/Q /; s/,([\d.]+)/ sprintf " %-4g", $1 /ge; print; } print my $pound = "#" x 78 . "\n"; printf "SUM Of weights " . " %-4g" x 10 . "\n", @sums; print $pound; __DATA__ PATTERN,pat0,pat1,pat2,pat3,pat4,pat5,pat6,pat7,pat8,pat9 U_TOP_LOGIC/ipre_reg_0/Q,0,0,1,1,0,0,1,1,0,0 U_TOP_LOGIC/ipre_reg_6/Q,1,1,0,0,1,1,0,0,1,1 U_TOP_LOGIC/pre_reck_1/Q,1,1,0,1,1,0,0,1,1,0 U_TOP_LOGIC/pre_reg_10/Q,0,1,0,1,1,0,0,1,1,1 U_TOP_LOGIC/pre_reg_11/Q,0,0,1,0,1,0,0,1,0,1

    Outputs:

    Number pat0 pat1 pat2 pat3 pat4 pat5 pat6 pat7 pat8 +pat9 U_TOP_LOGIC/ipre_reg_0/Q 0 0 0.5 0 0.5 0 0.5 0 0.5 + 0 U_TOP_LOGIC/ipre_reg_6/Q 0 0 0.5 0 0.5 0 0.5 0 0.5 + 0 U_TOP_LOGIC/pre_reck_1/Q 0 0 0.5 0.5 0 0.5 0 0.5 0 + 0.5 U_TOP_LOGIC/pre_reg_10/Q 0 0.5 0.5 0.5 0 0.5 0 0.5 0 + 0 U_TOP_LOGIC/pre_reg_11/Q 0 0 0.5 0.5 0.5 0.5 0 0.5 0.5 + 0.5 ###################################################################### +######## SUM Of weights 0 0.5 2.5 1.5 1.5 1.5 1 1.5 1.5 + 1 ###################################################################### +########

    Some alignments have been tweaked, and a math error fixed.

Re: Perl script for the post processing of one CSV file
by AnomalousMonk (Archbishop) on Oct 03, 2019 at 17:54 UTC

    Here's another tybalt89-ish solution:

    c:\@Work\Perl\monks>perl -wMstrict -le "use Test::More 'no_plan'; use Test::NoWarnings; ;; use constant W => 0.5; ;; my @records = ( 'U_TOP_LOGIC/ipre_reg_0/Q,0,0,1,1,0,0,1,1,0,0', 'U_TOP_LOGIC/ipre_reg_6/Q,1,1,0,0,1,1,0,0,1,1', 'U_TOP_LOGIC/pre_reck_1/Q,1,1,0,1,1,0,0,1,1,0', 'U_TOP_LOGIC/pre_reg_10/Q,0,1,0,1,1,0,0,1,1,1', 'U_TOP_LOGIC/pre_reg_11/Q,0,0,1,0,1,0,0,1,0,1', ); ;; my $rx_transition = qr{ (?<= 1) 0 | (?<= 0) 1 }xms; ;; my @pat = (0); for my $record (@records) { my (undef, $trans) = split m{ \d/Q, }xms, $record, 2; $trans =~ tr/01//cd; $pat[ $-[0] ]++ while $trans =~ m{ $rx_transition }xmsg; } $_ *= W for @pat; ;; is_deeply \@pat, [ 0, 0.5, 2.5, 1.5, 1.5, 1.5, 1, 1.5, 1.5, 1, ], 'weighted'; ;; done_testing; " ok 1 - weighted 1..1 ok 2 - no warnings 1..2

    Update: You can simplify the code slightly by changing the
        $pat[ $-[0] ]++ while $trans =~ m{ $rx_transition }xmsg;
    statement to
        $pat[ $-[0] ] += W while $trans =~ m{ $rx_transition }xmsg;
    and getting rid of the
        $_ *= W for @pat;
    post-loop weighting fixup statement altogether – but this code might be slightly slower.


    Give a man a fish:  <%-{-{-{-<

Re: Perl script for the post processing of one CSV file
by davido (Cardinal) on Oct 03, 2019 at 05:39 UTC

    How is Perl involved?


    Dave

      I was wondering the same thing. We sure do speak a weird form of Java around here. Eclipse is bound to throw a fit.

      use strict; use warnings; omitted for brevity.

      Hi Dave,

      I used some initial perl script to create this input CSV file and now I am looking another perl script to generate the output as stated in the question.

      Thanks Kshitij

Re: Perl script for the post processing of one CSV file
by BillKSmith (Monsignor) on Oct 03, 2019 at 17:52 UTC
    A "quick and dirty" way to do this task is to use the perl Command Switches "-F" to do the input, main loop, and splitting. Use an END block to print the summary. Note that the variable @sums must be a declared as a package variable in order to access it in the END block. (Anyone know a better way?)
    #!perl -F',' use strict; use warnings; use feature 'state'; state $header = 1; our @sums; if ($header) { printf "%-24s" . " %4s"x10, 'Number', @F[1..10]; $header = 0; next; } chomp $F[10]; for (reverse 2..10) { $F[$_] = ( ( $F[$_-1] xor $F[$_] ) ? .5 : 0 ); $sums[$_] += $F[$_]; } $F[1] = 0; $sums[1] += $F[1]; printf "%-24s " . " %3.1f "x10 . "\n", @F; END{ our @sums; printf '*'x75 . "\n"; printf "%-24s" . " %3.1f "x10 . "\n", "Sum of weights", @sums[1..10]; printf '*'x75 . "\n"; }
    Bill
      the second "our @sums" inside the END block is not needed. "our" already makes is accessible to all.
      And if the first declaration was "my @sums", then it can not be used by the END block, not even as @main::sums

        the second "our @sums" inside the END block is not needed

        Good point! I have often needed a second 'our' in a BEGIN block at the start of a file when using "-n". I never noticed that an END block at the end is included in the scope of the first 'our'. Note that if we change the single 'our' to 'my', the code still compiles without error. It fails to execute correctly due to life-time issues, not scope.

        Bill