Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Merging Columns from multiple files into a single file

by AppleFritter (Vicar)
on Oct 18, 2015 at 19:20 UTC ( [id://1145265]=note: print w/replies, xml ) Need Help??


in reply to Merging Columns from multiple files into a single file

You cannot easily write to columns in a file; text file access is inherently row-based. I reckon there may well be CPAN modules to access columns in files, but if you seek to combine data from several files it's a much better idea to read all the input files and then write to the output file once, e.g.:

#!/usr/bin/perl use Modern::Perl '2014'; # generate some filenames my @inputfiles = map { "data$_.txt" } 1 .. 3; my $inputdata = {}; # read files into $inputdata foreach my $filenumber (0 .. $#inputfiles) { open my $HANDLE, "<", $inputfiles[$filenumber] or die "Cannot open $inputfiles[$filenumber]: $!\n"; while(<$HANDLE>) { chomp; my ($freq, $data) = split /\s+/, $_, 2; $inputdata->{$freq}->[$filenumber] = $data; } close $HANDLE or warn "Cannot close $inputfiles[$filenumber]: $!\n"; } # write combined output open my $OUTPUT, ">", "combined.txt" or die "Cannot open combined.txt: $!\n"; foreach my $freq (sort keys %{ $inputdata }) { say $OUTPUT "$freq ", join " ", @{ $inputdata->{$freq} }; } close $OUTPUT or warn "Cannot close combined.txt: $!\n";

Replies are listed 'Best First'.
Re^2: Merging Columns from multiple files into a single file
by ikhan (Initiate) on Oct 20, 2015 at 05:18 UTC

    Hi AppleFritter:

    Thanks for taking time. There is little hickup in the code. In the combined output file, we only see the last

    line form each input file. I tried adding new line character in the "say" statement but not successful.

    can you please take a look. Appreciate your time

      It looks like your data has a space at the start of each line. If so, try

      while(<$HANDLE>) { chomp; s/^\s+//; # remove leading spaces my ($freq, $data) = split /\s+/, $_, 2; $inputdata->{$freq}->[$filenumber] = $data; }
      poj

      As my bro-tastic monastic Brother poj already pointed out, it may be that your data has spaces at the beginning of each line.

      In general, the way you read data from your input files will depend on their exact format, its constraints, and the assumptions you are allowed to make. (Thank you, Dame Captain Obvious.) The code I posted was fairly simple and assumed a fairly rigid structure: one row of data per line, exactly one set of data per file, and each line conforming to the following structure:

      freq marker (containing no whitespace); any amount of whitespace; data (possibly including whitespace)

      Note that the split call splits on whitespace (\s+) and limits itself to two (2) fields, so if the file indeed conforms to this structure, you'll get your freq marker and data just as expected. However, if a line starts with whitespace, split will see and split on that instead, and return an empty string (then assigned to $freq) followed by the entire rest of line after said leading whitespace (then assigned to $data).

      So what should you do, then? It depends. If extra whitespace is the worst that can happen to you, then use poj's solution to remove leading spaces on each line. Otherwise, you'll have to think about what sort of file structure you can expect, and modify your script accordingly to deal with all possible corner cases.

      As an aside: do you have control over where and how these data files are generated? If so, it may be worth modifying the producing script instead (or as well); it's often easier to not output data in a certain way to begin with than to try to parse it back it back when you can rely on fewer assumptions. (But also remember the Robustness principle: be conservative in what you output, and liberal in what you accept. This is true even for data you generate and consume entirely by yourself.)

      Also, avoid reinventing the wheel (unless it's necessary and/or fun, of course). Instead of relying on ad-hoc formats, you may be better off utilizing a standard format such as CSV for your data files, using e.g. Text::CSV to do all the heavy lifting for you (input and output).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1145265]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2024-03-29 14:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found