I've read the rationale for using MCE for this task in the comments for the previous post, but I still don't understand why parallelism helps here: surely gathering four parts from a 101-character string should not be a computationally expensive task?
Each one of the $k1-4 will extract the correct data into the right column, but it won’t put it together.
First of all, there is a warning,
^+ matches null string many times in regex; marked by <-- HERE in m/^+ <-- HERE / at 1138497.pl line 31, <$_IN_FILE> line 1.
which means that you should have escaped (
quotemeta or
\Q...\E) your
$match_string before interpolating it into a regex. Next, in your
while ( <$ifh> ) loop you try to process next string after getting the first column of the first string, while all your data are in one single string. You should remove
next from your
while loop. Furthermore, it may make sense to rewrite the string processing part to make the string match only once and speed the process up. Try this:
use warnings;
use strict;
print "HEC1_ID,Q100_Base,TTP,Area\n";
while (<DATA>) {
s{
^ # at the beginning of the line
\+ # followed by literal plus
(?:\s+(\S+)) # column one
(?:\s+(\S+)) # column two
(?:\s+(\S+)) # column three
(?:\s+\S+){3} # skip three columns
\s+(\S+)$ # catch the last one, too
}{$1,$2,$3,$4}x
and print;
}
__DATA__
+ BPI30 1319. 13.50 477.
+ 147. 49. 4.64
ROUTED TO
+ RPI30 1220. 13.75 475.
+ 147. 49. 4.64
HYDROGRAPH AT
+ BPI31 765. 12.42 102.
+ 26. 9. .73
2 COMBINED AT
+ CPI31 1242. 13.75 571.
+ 172. 58. 5.37
HEC1_ID,Q100_Base,TTP,Area
BPI30,1319.,13.50,4.64
RPI30,1220.,13.75,4.64
BPI31,765.,12.42,.73
CPI31,1242.,13.75,5.37