http://qs321.pair.com?node_id=971977

quelos27 has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks, I have a script here that need some tweaking to achieve the outcome that I am after:
my %c; while( <> ){ chomp; my($c2,$c4,$c12)=(split/\|/)[1,3,11]; $c{"$c2|$c4"}||=[$_,0]; ++$c{"$c2|$c4"}[1] if $c12=~/\S/; } $"="|"; for( sort keys %c ){ print "@{$c{$_}}\n"; }

Here is the input file:

col1|col2|col3|col4|col5|col6|col7|col8|col9|col10|col11|col12 BLA|001036|S|3228|10|1|2|3|001036|W035|S| BLA|001036|S|3228|0|0|0|0|001036|W035|S|08961029909655092918 BLA|001036|S|3228|0|0|0|0|001036|W035|S|08961029909655092926 BLA|001036|S|3228|0|0|0|0|001036|W035|S|08961029909655092934 BLA|001036|S|3228|0|0|0|0|001036|W035|S|08961029909655092942 BLT|600123|S|3437|0|20|0|0|001036|W035|S| BRO|900177|S|3531|-1|0|0|0|001036|W035|S| CHL|123777|S|3327|3|0|0|0|001036|W035|S| CHL|123777|S|3327|0|0|0|0|001036|W035|S|08961029909655093791 CHL|123777|S|3327|0|0|0|0|001036|W035|S|08961029909655093775

Basically I am trying to count the number of string that appears in the last column and append the count as a new column in the output file. My references/main keys for the initial array are column 2 (001036) and column 4 (3228). For each new occurrence of col 2 and col 4(e.g 001036 and 3228), the last column would always be a space (" "). So if($col12 != " "), i need to count the number of string in the last column that appeared after it. The final outcome should look something like the following:

BLA|001036|S|3228|10|1|2|3|001036|W035|S| |4 BLT|600123|S|3437|0|20|0|0|001036|W035|S| |0 BRO|900177|S|3531|-1|0|0|0|001036|W035|S| |0 CHL|123777|S|3327|3|0|0|0|001036|W035|S| |2
When I run the script, what I am getting is:
BLA|001036|S|3228|10|1|2|3|001036|W035|S| |4 CHL|123777|S|3327|3|0|0|0|001036|W035|S| |2 BLT|600123|S|3437|0|20|0|0|001036|W035|S| |0 BRO|900177|S|3531|-1|0|0|0|001036|W035|S| |0

How do I get the count value to not print to a new line? In desperate need of help. I am fairly new with perl. Thank you in advance!! Regards, Jason

Replies are listed 'Best First'.
Re: Need help with a simple perl script
by Neighbour (Friar) on May 23, 2012 at 08:45 UTC
    For someone who's "fairly new" to perl, this is some heavy-duty code :)
    The thing is...When I run the code on a linux-system with perl 5.10.1, I'm getting the results exactly like your desired outcome.
    What environment (OS, perl version) are you running your script on?
Re: Need help with a simple perl script
by choroba (Cardinal) on May 23, 2012 at 08:46 UTC
    I cannot replicate your problem. I am getting almost the desired output. My guess: This might be a cross-platform issue with end-of-line markers. What system are you running the script on? What line endings does your input file use?
Re: Need help with a simple perl script
by kcott (Archbishop) on May 23, 2012 at 17:30 UTC

    Like others, I cannot reproduce your error with the code you've posted. Using Perl 5.14.2 and Mac OS X 10.7.4, I get:

    BLA|001036|S|3228|10|1|2|3|001036|W035|S| |4 CHL|123777|S|3327|3|0|0|0|001036|W035|S| |2 BLT|600123|S|3437|0|20|0|0|001036|W035|S| |0 BRO|900177|S|3531|-1|0|0|0|001036|W035|S| |0

    You asked "How do I get the count value to not print to a new line?". However, that's not really what's happening. Taking the first line of what you're getting, this actually looks like:

    BLA|001036|S|3228|10|1|2|3|001036|W035|S|<space><some-return-char>|4<n +ewline>

    The last pipe character comes from $"="|";; the 4 is your count value; the terminal <newline> is the \n from the print statement. The bogus <some-return-char> comes from the while loop's $_ value. I can reproduce your output by adding my own bogus return character:

    ... while( <DATA> ){ chomp; $_ .= qq{\n}; ...

    which now outputs:

    BLA|001036|S|3228|10|1|2|3|001036|W035|S| |4 CHL|123777|S|3327|3|0|0|0|001036|W035|S| |2 BLT|600123|S|3437|0|20|0|0|001036|W035|S| |0 BRO|900177|S|3531|-1|0|0|0|001036|W035|S| |0

    You'll need to determine where the bogus return is added: it could be when the file is initially populated or perhaps due to some subsequent processing it undergoes before you read it. I've encountered this problem in the past when data is copied and pasted from an email, when data is transferred via FTP in binary mode and similar scenarios involving systems using different line endings.

    I would suggest the following as your options:

    1. Fix the problem with the source data and leave your code as it is.
    2. Run the source data through some cleaning filter before you read it and leave your code as it is.
    3. Edit your code to remove any bogus returns immediately after the chomp.

    -- Ken

Re: Need help with a simple perl script
by live4tech (Sexton) on May 24, 2012 at 04:24 UTC
    If you do have extra newlines or CR+newlines in your original data, viewing it with a HEX editor like Bless might help. It will show the extra 0A and/or 0D.
Re: Need help with a simple perl script
by uday_sagar (Scribe) on May 23, 2012 at 10:04 UTC

    Getting the same output that you wanted!

    And i think the code you have given needs slight modification:

    1. for loop to be included in while

    2. $_="|" for $"="|"

    (I know you knew these :-))

    Here I have modified.

    my %c; while( <> ){ chomp; my($c2,$c4,$c12)=(split/\|/)[1,3,11]; $c{"$c2|$c4"}||=[$_,0]; ++$c{"$c2|$c4"}[1] if $c12=~/\S/; $_="|"; for( sort keys %c ){ print "@{$c{$_}}\n"; } }
      Actually, no. Why should anyone set $_ to anything just to overwrite the value in a for loop? The code works - why does it need any modification? See perlvar for the special variable $".

        Okay, I agree with $". Thanks

        For the while loop, as "print" is not contained in while, it ll take the inputs all the time without printing anything.