grashoper has asked for the wisdom of the Perl Monks concerning the following question:
I am trying to parse a directory of files which are comma delimited each field value pair contains a , and a newline seperator is at the end of the field value pair, I would like to readin the entire directory of files and modify the file itself to be a csv file without the new lines I would like to replace the newlines with comma's basically this should be fairly easy to do, but I am having trouble figuring out how to do it.
all I really need is to change the file so that I have label,value, so I can import it into excel.
here is what I have so far..
oh and the stuff in the bottom is a an example of the source files format..
#!/usr/local/bin/perl
use File::Find;
$dirname=argv[];
my ($maxwidth)=0;
my (%cmTime,%hMtime,%oMtime);
my ($file,$base,$fmt);
opendir(DIR, $dirname) or die "can't opendir $dirname: $!";
while (defined($file = readdir(DIR))) {
next unless ($base)=$file=~/(.+)\.txt$/;
open (fh, ">", $file) or die "Can't openfile $file: $!";
while (<fh>)
{
chomp;
($timestamp,@fields)=split/,/;
}
close (fh) or die "unable to close $file: $!";
$actioncode=$timestamp;
if ($actioncode='Login')
{
print "found login";
}
print ("succesfully opened file");
print("$file \n");
}
closedir(DIR);
/8/2008 1:37:12 AM
Login,27.742460
SearchLoad,21.855703
SearchCount,45:43.520663
SearchResults,69.364964
SearchSave,107.611750
SearchDelete,29.317471
SearchDetails,10.493312
TaxLoad,24.488549
TaxResults,197:15.273238
TaxDetails,10.222830
ClientAdd,10.763459
CMALoad,7.398715
CMASave,16.196349
CMADelete,3.009030
ClientDelete,6.442495
Logout,0.873738
Re: parsing files and modifying format
by olus (Curate) on Jul 17, 2008 at 22:18 UTC
|
for i in `find . *.txt`; do perl -pi -e 's/\n/,/' $i; done;
update or maybe a Perl only solution. This one uses recursion to traverse subdirs.
#!/usr/bin/perl
use strict;
use warnings;
my $dir = ".";
lista_dirs($dir, $dir);
sub lista_dirs {
my ($dir, $dirname) = @_;
my (@dirs, @files);
opendir DIR, "$dir";
my @dircontent = grep { /[^\.]/ } readdir(DIR);
closedir DIR;
@dirs = ();
@files = ();
foreach(@dircontent) {
if(-d "$dir/".$_) {
push @dirs, $_;
} else {
push @files, $_ if($_ =~ /\.txt$/);
}
}
foreach my $d(@dirs) {
lista_dirs("$dir/".$d, $d);
}
# IMPORTANT NOTE: if your files aren't too big, you can read every
+thing at once as shown in the code below. # if note, you can open
+the input file for reading and a temp file for wirting. Read the file
+ line by line, # replace \n with the , and write to the temp file
+. Close both files when you're done, and then 'rename'
# the temp file to the input filename
foreach my $file (@files) {
$/ = '';
open FILE, "$dir/$file";
my $content = <FILE>;
close FILE;
$content =~ tr/\n/,/;
open FILE, ">$dir/$file";
print FILE $content;
close FILE;
}
return;
}
<p><b>update 2</b> After reading the other replies I realized I misrea
+d the question, and there was no need for recursion, so the solution
+came to be more complex than it needed to be. I was really in the nee
+d to get some sleep.</p>
| [reply] [d/l] [select] |
Re: parsing files and modifying format
by graff (Chancellor) on Jul 18, 2008 at 04:05 UTC
|
Perhaps you haven't explained your goal very well, but in any case I'm sure your code as posted does not do what you want. If I understand it correctly:
- There is one directory full of files (so no need to do recursion over subdirectories).
- Each file contains a relatively small number of lines.
- Each line is a "name,value" pair (except the first line, which is just a timestamp).
- You want all the lines in a given file to joined into a single line, by converting line breaks into commas, so each file ends up being just "name,value,name,value,..."
- When that processing is done, you want to load all the file data into an excel spreadsheet, with each original source file being one row.
If that's the basic plan, I wouldn't bother with writing modified versions of every original file. Just output a single stream of data (store it all in a single file), having one line of text for each input file. Something like this:
#!/usr/bin/perl
use strict; # you really should!
( @ARGV == 1 and -d $ARGV[0] ) or die "Usage: $0 dir_name > out.csv\n"
+;
my $dirname = shift;
opendir( DIR, $dirname ) or die "$dirname: $!";
my @files = grep /.+\.txt$/, readdir DIR;
for my $file ( @files ) {
my ( $basename ) = ( $file =~ /(.+)\.txt$/ );
if ( open( F, $file )) {
my ( $date, @lines ) = <F>;
chomp $date;
chomp @lines;
print join( ",", "sourcefile", $basename, "date", $date, @line
+s ), "\n";
close F;
}
else {
warn "$file: $!\n";
}
}
(I'm assuming you want to keep track of the original file names where all of the output rows come from, as well as all the time-stamps.)
Of course, if it turns out that all the files have the same number of lines, and the same set of names in their various "name,value" pairs, and these names are always in the same order, a better input for excel would have a single line at the top with the names, and then each following line would just have the values in the proper order:
#!/usr/bin/perl
use strict;
( @ARGV == 1 and -d $ARGV[0] ) or die "Usage: $0 dir_name > out.csv\n"
+;
my $dirname = shift;
opendir( DIR, $dirname ) or die "$dirname: $!";
my ( $firstfile, @files ) = grep /.+\.txt$/, readdir DIR;
open( F, $firstfile ) or die "$firstfile: $!";
my ( @names, @values );
my ( $basename ) = ( $firstfile =~ /(.+)\.txt$/ );
my $date = <F>;
while (<F>) {
chomp;
my ( $n, $v ) = split /,/, $_, 2;
push @names, $n;
push @values, $v;
}
close F;
print join( ",", "sourcefile", "date", @names ), "\n";
print join( ",", $basename, $date, @values ), "\n";
for my $file ( @files ) {
( $basename ) = ( $file =~ /(.+)\.txt$/ );
if ( open( F, $file )) {
( $date, @values ) = <F>;
chomp $date;
chomp @values;
s/.+?,// for ( @values ); # delete the "name," parts
print join( ",", $basename, $date, @values ), "\n";
close F;
}
else {
warn "$file: $!\n";
}
}
(Not tested)
If you're not sure whether all the files have the same set of names in the same order, it would be easy enough to check for that before you try to create the single data stream for importing to excel. | [reply] [d/l] [select] |
|
graf you are correct sample2 is what I needed, except each file has multiple records, example I provided was merely one of them,it looks like the +? isn't working quite the way I though it would in this one line though..not sure why I expected the names, would not get printed but instead they do appear in my csv, I was thinking I would do an ole call to excel and just dump the values right on in there without the names but it doesnt.. is that due to the join prints above this loop?
for my $file ( @files ) {
( $basename ) = ( $file =~ /(.+)\.txt$/ );
if ( open( F, $file )) {
( $date, @values ) = <F>;
chomp $date;
chomp @values;
s/.+?,// for ( @values ); # delete the "name," parts
print join( ",", $basename, $date, @values ), "\n";
close F;
}
else {
warn "$file: $!\n";
}
}
| [reply] [d/l] |
|
I'm trying to work out where your sentence boundaries are -- if you add periods and sentence-initial capitalization in the paragraph text, that would help...
each file has multiple records, example I provided was merely one of them
If you have trouble figuring out how to parse the input files, start a new thread on that. As I mentioned in my first reply, it should be easy to work out how the file contents are arranged, and to treat them accordingly.
it looks like the +? isn't working quite the way I though it would in this one line though..
I have no clue what you're talking about there -- maybe if you show a specific input data set and the resulting output, and explain what's wrong with that output, it will be clear what's wrong.
I was thinking I would do an ole call to excel and just dump the values right on in there without the names but it doesnt..
I know nothing about ole, and I see no reason to go there, since I do know that a simple comma-delimited file, with a single line at the top for column headings, works fine for importing to excel.
Of course, if the "values" contain commas (and quotes), then it is not simple, because fields containing commas as data need to be quoted, and quotes occurring as data in any field need to be escaped.
As a rule, it's fairly common for commas and quotes to appear in text data, but it's probably a lot less likely that your data files contain tabs, so you might consider writing tab-delimited output instead of comma delimited -- that works equally well for importing to excel.
| [reply] |
|
actually I missed an important piece of information here, I need the labels,login,searchload etc once and the subsequent values are to become rows of data, goal is to put this data into a database, a single line of csv is going to give me a row that exceeds limits in excel, data files are fairly small 11kb or so but there are a lot of them. The timestamp acts as a row separator of sorts its the time the test was run & it should start a new row in excel.
this was a rush job I received late in the day and I am finding it much more challenging than I expected.
I really appreciate the help.
| [reply] |
|
Actually I don't see how I delete the name parts, when I run the above I get both names and values, I really just want one row of names and the rest values.
| [reply] |
Re: parsing files and modifying format
by GrandFather (Saint) on Jul 17, 2008 at 23:37 UTC
|
Using full stop delimiters on your sentences would aid in understanding your question.
Perl is environmentally friendly - it saves trees
| [reply] |
Re: parsing files and modifying format
by oko1 (Deacon) on Jul 18, 2008 at 02:57 UTC
|
Perhaps I'm missing something, but the original request didn't say anything about doing recursive processing. Assuming that it's required, based on the OP's use of 'File::Find' in his script (which he's not actually using) may not be appropriate.
Perl-only solution from the command line (processes all files in the current directory, ignores lines that don't have commas, converts the file content to a single line of CSV):
perl -i -0wpe's/\A[^\n]+\n//;s/\n(?!\Z)/,/g' *
--
Human history becomes more and more a race between education and catastrophe. -- HG Wells
| [reply] [d/l] |
|
|