gisrob has asked for the wisdom of the Perl Monks concerning the following question:
Monks -
I have two different directories containing files with the same name, e.g. /home/robs/data/10.06.94.txt /home/robs/newdata/10.06.94.txt
I'd like to loop over the directories, open up each similarly named file, read it in to an array, and then print selected columns to a new file. Seems pretty basic, but a) I'm stumped on how to print to the third file, and b) I don't know how to automate this with OPENDIR and READDIR to do this iteratively for all files.
Here's what I've come up with, but it doesn't work.
#!usr/bin/perl
$folder1 = "C:/neaq/ArcData/bluefin/data_analysis/1994sptr/data";
chdir "$folder1";
$textfile = "10.06.94.txt";
open(INFILE, $textfile) or
die "Cannot open: $textfile\n";
while(<INFILE>) {
chomp;
@data = split;
($mask, $x, $y, $dfront, $ftdens, $depth, $slope, $tuna, $temp, $s
+ptr) = @data;
}
close(INFILE);
$folder2 = "C:/neaq/ArcData/bluefin/data_dens/1994sptr";
chdir "$folder2";
$textfile2 = "10.06.94.txt";
open(INFILE2, $textfile2) or
die "Cannot open: $textfile2\n";
while(<INFILE2>) {
chomp;
@data2 = split;
($mask, $x, $y, $ftdens) = @data2;
}
close(INFILE2);
open(OUTFILE, "> newdens.txt");
while (<>) {
print OUTFILE $data[1], $data[2], $data[3], $data[4], $data[5], $d
+ata[6], $data[7], $data2[3], "\n";
}
close(OUTFILE);
The code opens the outfile, but prints nothing to it, and just hangs. Any advice on how to accomplish this?
Thanks
Re: Printing Columns from Two Different Files
by boo_radley (Parson) on Jan 31, 2003 at 16:59 UTC
|
you've got a bunch of the pieces put together correctly, but there's a few areas where these individual bits don't mesh together right. You're reading two files (the while (<INFILE>) and while (<INFILE2>)), but you're not storing the results of those reads. In fact each run through the loop overwrites the information that was there previously. You might declare arrays outside of these loops and store the results of the splitting in them, so that you preserve the work you're doing.
You don't seem to use $mask, $x, $y, $dfront, $ftdens, $depth, $slope, $tuna, $temp or $sptr in any useful fashion; is this part of a larger script? If not, you might consider
dropping them entirely because they're not doing you any good.
It may also benefit you to drop the the chdirs and simply open the files with :
$folder2 = "C:/neaq/ArcData/bluefin/data_dens/1994sptr/";
$textfile2 = "10.06.94.txt";
open(INFILE2, "$folder2$textfile2") or
die "Cannot open: $textfile2\n";
Finally, the diamond operator doesn't do what I expect you expect it does. It runs through @ARGV (which itself gets populated with the script's) command line arguments, treating each element as a filename which gets opened, and read line by line. Since no arguments were specified, it's waiting for input on STDIN ("just hangs").
As a rough, untested outline, here's my take on the problem.
It's not fully fleshed out, but it provides a quick idea of what a solution might look like.
open FH, "first file" || die "problem with first file : $!";
@first =<FH>; # slurp into array
close FH;
open FH, "next file" || die "problem with first file : $!";
@second =<FH>; # slurp into array
close FH;
# at this point, we've got the lines of each file in two different arr
+ays.
# if it's important that they have the same number of lines, you can
# check with if (scalar (@first) != scalar (@second)) {...
#
# now start output to the third file. maybe check to see if it exists?
open OUTPUT, ">output.txt";
# since we're reading from 2 different arrays
# it's easier to use for (0..$#first) rather than foreach (@first)
# so that we can apply the element number to both arrays.
#
my $sep=","; # specifies comma separated output. this might be bad if
+your data has commas in it.
for (0..$#first) { # this is a liability if the second file has more l
+ines...
print OUTPUT join $sep, split (/\s+?/, $first($_)), split (/\s+?/,
+ $second($_));
}
close OUTPUT;
| [reply] [d/l] [select] |
Re: Printing Columns from Two Different Files
by Hofmator (Curate) on Jan 31, 2003 at 17:19 UTC
|
If you are processing the three files (two input, one output) really line by line, then one while loop is enough. Just open all three files beforehand and close them afterwards. Something like the following code should work then:
my $folder1 = 'folder1'
my $folder2 = 'folder2';
my $folder_out = 'outfolder';
my $textfile = 'file.txt';
open(my $file1, "< $folder1/$textfile") or die $!;
open(my $file2, "< $folder2/$textfile") or die $!;
open(my $outfile, "> $folder_out/$textfile") or die $!;
while (my $line1 = <$file1> ) {
my $line2 = <$file2>;
my @data1 = split, $line1;
my @data2 = split, $line2;
print $outfile @data1[1..7], $data2[3], "\n";
}
close($file1);
close($file2);
close($outfile);
-- Hofmator | [reply] [d/l] |
Re: Printing Columns from Two Different Files
by cfreak (Chaplain) on Jan 31, 2003 at 17:00 UTC
|
I see a couple of problems: first, and I'm sure you're about to hear a lot of this: You should always use strict; and use warnings;. Strict forces you to localize your variables with my, both directives (placed at the top of your program) are very useful for helping you debug your program).
As for the immediate problems: While reading data in I notice that you have this (similar for the second one):
@data = split;
($mask,$x,$y,$ftdens) = @data;
split with no arguments splits on nothing for $_. I'm not certain of your data structure but this is going to leave you with an array of every character on each line, I'm not convinced that is what you want.
Perhaps you could show us a sample of your data file. Also the @data and @data2 arrays as well as the variables that you create on the second line aren't even being used, and get clobbered everytime through the loop.
Finally your output is the biggest problem. Your program is hanging because you've opened the output file and then did:
while(<>) {
That is causing your program to loop forever (thus the hang that you see). That while tells your program to read from the filehandle that you opened for writing. On top of that your @data is only going to contain the last line from the file that you opened.
Here is a program that works on comma separated values. You'll have to taylor it for your own data structure :)
#!/usr/bin/perl
use strict;
use warnings;
my $file1 = "file1.txt";
my $file2 = "file2.txt";
my $outfile = "outfile.txt";
my @data1 = ();
my @data2 = ();
open(INFILE1,"$file1") or die "Couldn't open $file1: $!\n";
while(<INFILE1>) {
chomp;
my @line = split(/,/,$_);
push(@data1,\@line);
}
close(INFILE1);
open(INFILE2,"$file2") or die "Couldn't open $file2: $!\n";
while(<INFILE2>) {
chomp;
my @line = split(/,/,$_);
push(@data2,\@line);
}
close(INFILE2);
# now merge the data in the outfile
open(OUTFILE,">$outfile") or die "Couldn't open $outfile: $!\n";
my $count = 0;
foreach(@data1) {
print OUTFILE @$_,$data2[$count]->[3];
$count++;
}
close(OUTFILE);
I hope that helps. Post some of your data maybe we can help further
Chris
Lobster Aliens Are attacking the world!
Update: Opps Fletch is correct split $_ splits on whitespace.
| [reply] [d/l] [select] |
|
| [reply] [d/l] [select] |
|
Thanks for the tips.
I posted a snippet of data in one of the replies below.
Let me know if it helps.
| [reply] |
Re: Printing Columns from Two Different Files
by l2kashe (Deacon) on Jan 31, 2003 at 17:11 UTC
|
Ok first off im going on the assumption that the files will always be the same name, second off I am assuming that you know what data you want every time. Also apparently your data file only has one line? If not then you are only getting the last line from the file each time. I have moved any ||'s and &&'s to newlines so that the code doesn't wrap so much
#!/usr/bin/perl
$base1 = "C:/neaq/ArcData/bluefin/data_analysis/1994sptr/data";
$base2 = "C:/neaq/ArcData/bluefin/data_dens/1994sptr";
$outfile = "C:/some/dir/to/reports";
opendir(BASE1, "$base1") || die "Cant open $base1\nReason: $!\n";
foreach $file ( grep(/\.txt$/, readdir(BASE1)) ) {
warn "Skip: No matching $file in $base2\n"
&& next unless (-f "$base2/$file");
open(IN, "$base1/$file")
|| die "Cant access $file\nReason: $!\n";
while ( <IN> ) {
@data1 = ( split(/\s+/) )[1,2,3,4,5,6,7];
}
close(IN);
open(IN, "$base2/$file")
|| die "Cant access $base2/$file\nReason: $!\n";
while ( <IN> ) {
$data2 = ( split(/\s+/) )[3];
}
close(IN);
open(OUT, ">>$outfile")
|| die "Cant append to $outfile\nReason: $!\n";
print OUT "@data1 $data2\n";
close(OUT);
} # END foreach $file readdir(BASE1)
#
# or alternately to save up our data and only print once
# push(@out, join(' ', "$file: ", @data1, $data2));
# then loop through out once and print it out here instead
#
/* And the Creator, against his better judgement, wrote man.c */ | [reply] [d/l] |
|
Well actually no. My data files have more than one line. Guess I'm getting many things wrong.
The data look like:
oldfile:
mask x y dln9476326lce dln94930101lc
0 8852.68825 442495.5253 82293.02 0
0 9913.98125 442495.5253 82751.22 0
0 10975.27425 442495.5253 83238.23 0
0 10975.27425 441434.2323 81704.7 0
0 12036.56725 440372.9393 80174.36 0
0 12036.56725 439311.6463 80174.36 0
0 12036.56725 438250.3533 78647.4 0
0 13097.86025 438250.3533 79192.59 0
0 13097.86025 437189.0603 77679.91 0
0 13097.86025 436127.7673 77679.91 0
<snip>
I haven't included all the columns, but there are 5 more.
newfile:
mask x y dln9476331lc
0 23710.79025 396859.92632 0
0 27955.96225 396859.92632 0.2530461
0 29017.25525 395798.63332 0.4151559
0 29017.25525 394737.34032 2.826168
0 19465.61825 393676.04732 0
<snip>
The files always have the same name in each directory, the same number of lines, and the same structure (within each directory, but different across directories). They are output from a GIS sampling program. The only difference is the days on which they were sampled.
Files look like:
C:\neaq\ArcData\bluefin\data_dens\1994_sptr>dir
10.06.94.txt
10.07.94.txt
7.10.94.txt
7.11.94.txt
7.12.94.txt
7.14.94.txt
<snip>
C:\neaq\ArcData\bluefin\data_analysis\1994sptr\data>dir
10.06.94.txt
10.07.94.txt
7.10.94.txt
7.11.94.txt
7.12.94.txt
7.14.94.txt
<snip>
| [reply] [d/l] [select] |
Re: Printing Columns from Two Different Files
by Fletch (Bishop) on Jan 31, 2003 at 16:54 UTC
|
while (<>) {
It's not hanging, it's waiting for input on STDIN just like you asked it to do. Perhaps you mean to iterate over @data instead?
| [reply] [d/l] |
Re: Printing Columns from Two Different Files
by BUU (Prior) on Jan 31, 2003 at 18:24 UTC
|
#my list of files
@ARGV=qw/txt1.txt foo.txt baz.txt etc.foo/;
open OUT,'out.txt';
while(<>){print OUT $_;}
| [reply] [d/l] |
Re: Printing Columns from Two Different Files
by hardburn (Abbot) on Jan 31, 2003 at 17:00 UTC
|
| [reply] |
Re: Printing Columns from Two Different Files
by bsb (Priest) on Feb 03, 2003 at 00:13 UTC
|
If you're on unix the 'cut' and 'paste' commands might
do part of what you're after.
Not perl but handy
cut - remove sections from each line of files
paste - merge lines of files
Or perhaps you can use perl for part of it and just
merge the results with paste.
| [reply] |
|
|