http://qs321.pair.com?node_id=1138547


in reply to Extract string to file

I've read the rationale for using MCE for this task in the comments for the previous post, but I still don't understand why parallelism helps here: surely gathering four parts from a 101-character string should not be a computationally expensive task?

Each one of the $k1-4 will extract the correct data into the right column, but it won’t put it together.
First of all, there is a warning,
^+ matches null string many times in regex; marked by <-- HERE in m/^+ <-- HERE / at 1138497.pl line 31, <$_IN_FILE> line 1.
which means that you should have escaped (quotemeta or \Q...\E) your $match_string before interpolating it into a regex. Next, in your while ( <$ifh> ) loop you try to process next string after getting the first column of the first string, while all your data are in one single string. You should remove next from your while loop. Furthermore, it may make sense to rewrite the string processing part to make the string match only once and speed the process up. Try this:
use warnings; use strict; print "HEC1_ID,Q100_Base,TTP,Area\n"; while (<DATA>) { s{ ^ # at the beginning of the line \+ # followed by literal plus (?:\s+(\S+)) # column one (?:\s+(\S+)) # column two (?:\s+(\S+)) # column three (?:\s+\S+){3} # skip three columns \s+(\S+)$ # catch the last one, too }{$1,$2,$3,$4}x and print; } __DATA__ + BPI30 1319. 13.50 477. + 147. 49. 4.64 ROUTED TO + RPI30 1220. 13.75 475. + 147. 49. 4.64 HYDROGRAPH AT + BPI31 765. 12.42 102. + 26. 9. .73 2 COMBINED AT + CPI31 1242. 13.75 571. + 172. 58. 5.37
HEC1_ID,Q100_Base,TTP,Area BPI30,1319.,13.50,4.64 RPI30,1220.,13.75,4.64 BPI31,765.,12.42,.73 CPI31,1242.,13.75,5.37

Replies are listed 'Best First'.
Re^2: Extract string to file
by oryan (Initiate) on Aug 28, 2015 at 15:47 UTC

    Thank you for the reply. I forgot to add my final solution here. I tried modifying my previous post question solution with the quotemeta as you had suggested but it did not seem to be working for me. While it has the warning, it does still provide the needed output.

    On this post your suggestions worked great. I simply added a slight modification so that I could set an input file and output to a file instead of onscreen. Here is what I ended up with.

    use warnings; use strict; ## Select input and output files my $input_file = 'InputFile.OH1'; my $output_file = 'OutputFile.csv'; open my $ifh, "<", $input_file or die "cannot open '$input_file' for reading: $!\n"; open my $ofh, ">", $output_file or die "cannot open '$output_file' for writing: $!\n"; ## Creates a header at the beginning of the file my $header = "HEC1_ID,Q100_Base,TTP,Area\n"; print $ofh $header; ## Extracts data from input and sends through STDOUT to output file select $ofh; while (<$ifh>) { s{ ^ # at the beginning of the line \+ # followed by literal plus (?:\s+(\S+)) # column one (?:\s+(\S+)) # column two (?:\s+(\S+)) # column three (?:\s+\S+){3} # skip three columns \s+(\S+)$ # catch the last one, too }{$1,$2,$3,$4}x and print; } select STDOUT;

    Once again, thanks for the help