Split lines in file to columns

Magnolia25 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

Below are the lines in my file. I am reading them into array and storing column values into variables.

ValuesInColumn1    DataColumnB    XYZ    RowDescription|RowCode|Suppli
+er ID::Region
ValuesInColumn1    DataColumnB    XYZ    RowDescription|RowCode|Suppli
+er ID::Region
ValuesInColumn1    DataColumnB    XYZ    RowDescription|RowCode|Suppli
+er ID::Region
ValuesInColumn1    DataColumnB    XYZ    RowDescription at RowCode
ValuesInColumn1    DataColumnB    ABC    RowDescription at RowCode
[download]

The following will take data from my file, split it.That works if the columns are divided on spaces. But value for colD is not captured correctly, as string in colD has sapces in between. If value for $colB = XYZ and $colD contains substring = Region , I need to replace the $colB from XYX to N/A.

foreach my $line (@a) { 
        my ($colA, $colB, $colC,$colD) = split( /\s+/, $line);
        
        #print "$colD \n";     
}
[download]

Please help.

Comment on Split lines in file to columns Select or Download Code

Replies are listed 'Best First'.
Re: Split lines in file to columns by hippo (Bishop) on Mar 20, 2019 at 09:43 UTC
But value for colD is not captured correctly, as string in colD has sapces in between. `my ($colA, $colB, $colC,$colD) = split( /\s+/, $line, 4);` If value for $colB = XYZ and $colD contains substring = Region , I need to replace the $colB from XYX to N/A. There are many ways to do this. What did you try? How did it fail?	[reply] [d/l]
Re^2: Split lines in file to columns by Magnolia25 (Sexton) on Mar 20, 2019 at 09:50 UTC
Thanks hippo that worked If value for $colB = XYZ and $colD contains substring = Region , I need to replace the $colB from XYX to N/A Now I am going to try this out , was stuck at that part. So gave the reason why I am looking for this solution for earlier part	[reply]
Re: Split lines in file to columns by hdb (Monsignor) on Mar 20, 2019 at 10:01 UTC
If all your columns are fixed-width, you can use unpack use strict; use warnings; while(<DATA>) { my ($colA, $colB, $colC,$colD) = unpack "A19A15A7A45"; $colC = "N/A" if $colC eq "XYZ" and $colD =~ /Region/; print "$colA -- $colB -- $colC -- $colD\n"; } __DATA__ ValuesInColumn1 DataColumnB XYZ RowDescription\|RowCode\|Suppli +er ID::Region ValuesInColumn1 DataColumnB XYZ RowDescription\|RowCode\|Suppli +er ID::Region ValuesInColumn1 DataColumnB XYZ RowDescription\|RowCode\|Suppli +er ID::Region ValuesInColumn1 DataColumnB XYZ RowDescription at RowCode ValuesInColumn1 DataColumnB ABC RowDescription at RowCode [download]	[reply] [d/l]
Re: Split lines in file to columns by tybalt89 (Monsignor) on Mar 20, 2019 at 16:59 UTC
#!/usr/bin/perl # https://perlmonks.org/?node_id=1231482 use strict; use warnings; while(<DATA>) { chomp; my ($colA, $colB, $colC, $colD) = split /\s{2,}/; $colD =~ /Region/ and $colC =~ s{^XYZ\z}{N/A}; use Data::Dump 'dd'; dd $colA, $colB, $colC, $colD; } __DATA__ ValuesInColumn1 DataColumnB XYZ RowDescription\|RowCode\|Suppli +er ID::Region ValuesInColumn1 DataColumnB XYZ RowDescription\|RowCode\|Suppli +er ID::Region ValuesInColumn1 DataColumnB XYZ RowDescription\|RowCode\|Suppli +er ID::Region ValuesInColumn1 DataColumnB XYZ RowDescription at RowCode ValuesInColumn1 DataColumnB ABC RowDescription at RowCode [download]	[reply] [d/l]
Re: Split lines in file to columns by pgmer6809 (Sexton) on Mar 24, 2019 at 22:09 UTC
Here is some code that makes use of a couple of very nice perl features. 1) You can put your data after the _END_ statement and then have while read it in, rather than putting it in an array at the source. Makes for easier setting of test cases. 2) It uses REGEX power of Perl to split the line. This is much more flexible than split, and is the usual way that perl programmers parse stuff. I have added a couple of lines to your original input to show that a) if it is not REGION but say HQ the value is not replaced. Ditto if it is REGION but the original value is not XYZ. Not sure if that is exactly what you meant but the concept should be useful. #!/usr/bin/perl -w while (<DATA>) { #read a line into $_ chomp; $_ =~ m/^\s(\S+)\s+(\S+)\s+(\S+)\s+(.)$/; # col1 col_b xyz Descr my ($col1, $data_b, $xyz, $description ) = ($1, $2, $3, $4); if ( ( $description =~ m/Region/ ) && ( $xyz eq "XYZ" ) ) { $xyz = + "N/A"; } print "Line $. = $_\n"; print "\tXYZ result = $xyz \n"; } #end while DATA exit 1; __END__ ValuesInColumn1 DataColumnB XYZ RowDescription\|RowCode\|Suppli +er ID::Region ValuesInColumn1 DataColumnB XYZ RowDescription\|RowCode\|Suppli +er ID::Region ValuesInColumn1 DataColumnB XYZ RowDescription\|RowCode\|Suppli +er ID::Region ValuesInColumn1 DataColumnB XYZ RowDescription at RowCode ValuesInColumn1 DataColumnB ABC RowDescription at RowCode ValuesInColumn1 DataColumnB XYZ RowDescription\|RowCode\|Suppli +er ID::HQ ValuesInColumn1 DataColumnB BCD RowDescription\|RowCode\|Suppli +er ID::Region [download] The result of running the above is: Line 1 = ValuesInColumn1 DataColumnB XYZ RowDescription\|RowCo +de\|Supplier ID::Region XYZ result = N/A Line 2 = ValuesInColumn1 DataColumnB XYZ RowDescription\|RowCo +de\|Supplier ID::Region XYZ result = N/A Line 3 = ValuesInColumn1 DataColumnB XYZ RowDescription\|RowCo +de\|Supplier ID::Region XYZ result = N/A Line 4 = ValuesInColumn1 DataColumnB XYZ RowDescription at Ro +wCode XYZ result = XYZ Line 5 = ValuesInColumn1 DataColumnB ABC RowDescription at Ro +wCode XYZ result = ABC Line 6 = ValuesInColumn1 DataColumnB XYZ RowDescription\|RowCo +de\|Supplier ID::HQ XYZ result = XYZ Line 7 = ValuesInColumn1 DataColumnB BCD RowDescription\|RowCo +de\|Supplier ID::Region XYZ result = BCD [download]	[reply] [d/l] [select]
Re^2: Split lines in file to columns by haukex (Archbishop) on Mar 24, 2019 at 22:26 UTC
You can put your data after the _END_ statement Normally, one would use the `__DATA__` token for this purpose - see Special Literals in perldata. Update: Another nitpick: The special variables `$1` etc. should only be used if the match succeeds. And the two lines could be shortened to: `my ($col1, $data_b, $xyz, $description) = /^\s(\S+)\s+(\S+)\s+(\S+)\s+(.)$/ or die "Failed to parse: $_";`	[reply] [d/l] [select]


Pathologically Eclectic Rubbish Lister
	PerlMonks