parsing files and modifying format

grashoper has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: parsing files and modifying format by olus (Curate) on Jul 17, 2008 at 22:18 UTC
perhaps you could work with something from the command line like for i in `find . .txt`; do perl -pi -e 's/\n/,/' $i; done; [download] update* or maybe a Perl only solution. This one uses recursion to traverse subdirs. #!/usr/bin/perl use strict; use warnings; my $dir = "."; lista_dirs($dir, $dir); sub lista_dirs { my ($dir, $dirname) = @_; my (@dirs, @files); opendir DIR, "$dir"; my @dircontent = grep { /[^\.]/ } readdir(DIR); closedir DIR; @dirs = (); @files = (); foreach(@dircontent) { if(-d "$dir/".$_) { push @dirs, $_; } else { push @files, $_ if($_ =~ /\.txt$/); } } foreach my $d(@dirs) { lista_dirs("$dir/".$d, $d); } # IMPORTANT NOTE: if your files aren't too big, you can read every +thing at once as shown in the code below. # if note, you can open +the input file for reading and a temp file for wirting. Read the file + line by line, # replace \n with the , and write to the temp file +. Close both files when you're done, and then 'rename' # the temp file to the input filename foreach my $file (@files) { $/ = ''; open FILE, "$dir/$file"; my $content = <FILE>; close FILE; $content =~ tr/\n/,/; open FILE, ">$dir/$file"; print FILE $content; close FILE; } return; } <p><b>update 2</b> After reading the other replies I realized I misrea +d the question, and there was no need for recursion, so the solution +came to be more complex than it needed to be. I was really in the nee +d to get some sleep.</p> [download]	[reply] [d/l] [select]
Re: parsing files and modifying format by graff (Chancellor) on Jul 18, 2008 at 04:05 UTC
Perhaps you haven't explained your goal very well, but in any case I'm sure your code as posted does not do what you want. If I understand it correctly: There is one directory full of files (so no need to do recursion over subdirectories). Each file contains a relatively small number of lines. Each line is a "name,value" pair (except the first line, which is just a timestamp). You want all the lines in a given file to joined into a single line, by converting line breaks into commas, so each file ends up being just "name,value,name,value,..." When that processing is done, you want to load all the file data into an excel spreadsheet, with each original source file being one row. If that's the basic plan, I wouldn't bother with writing modified versions of every original file. Just output a single stream of data (store it all in a single file), having one line of text for each input file. Something like this: #!/usr/bin/perl use strict; # you really should! ( @ARGV == 1 and -d $ARGV[0] ) or die "Usage: $0 dir_name > out.csv\n" +; my $dirname = shift; opendir( DIR, $dirname ) or die "$dirname: $!"; my @files = grep /.+\.txt$/, readdir DIR; for my $file ( @files ) { my ( $basename ) = ( $file =~ /(.+)\.txt$/ ); if ( open( F, $file )) { my ( $date, @lines ) = <F>; chomp $date; chomp @lines; print join( ",", "sourcefile", $basename, "date", $date, @line +s ), "\n"; close F; } else { warn "$file: $!\n"; } } [download] (I'm assuming you want to keep track of the original file names where all of the output rows come from, as well as all the time-stamps.) Of course, if it turns out that all the files have the same number of lines, and the same set of names in their various "name,value" pairs, and these names are always in the same order, a better input for excel would have a single line at the top with the names, and then each following line would just have the values in the proper order: #!/usr/bin/perl use strict; ( @ARGV == 1 and -d $ARGV[0] ) or die "Usage: $0 dir_name > out.csv\n" +; my $dirname = shift; opendir( DIR, $dirname ) or die "$dirname: $!"; my ( $firstfile, @files ) = grep /.+\.txt$/, readdir DIR; open( F, $firstfile ) or die "$firstfile: $!"; my ( @names, @values ); my ( $basename ) = ( $firstfile =~ /(.+)\.txt$/ ); my $date = <F>; while (<F>) { chomp; my ( $n, $v ) = split /,/, $_, 2; push @names, $n; push @values, $v; } close F; print join( ",", "sourcefile", "date", @names ), "\n"; print join( ",", $basename, $date, @values ), "\n"; for my $file ( @files ) { ( $basename ) = ( $file =~ /(.+)\.txt$/ ); if ( open( F, $file )) { ( $date, @values ) = <F>; chomp $date; chomp @values; s/.+?,// for ( @values ); # delete the "name," parts print join( ",", $basename, $date, @values ), "\n"; close F; } else { warn "$file: $!\n"; } } [download] (Not tested) If you're not sure whether all the files have the same set of names in the same order, it would be easy enough to check for that before you try to create the single data stream for importing to excel.	[reply] [d/l] [select]
Re^2: parsing files and modifying format by grashoper (Monk) on Jul 18, 2008 at 06:17 UTC
graf you are correct sample2 is what I needed, except each file has multiple records, example I provided was merely one of them,it looks like the +? isn't working quite the way I though it would in this one line though..not sure why I expected the names, would not get printed but instead they do appear in my csv, I was thinking I would do an ole call to excel and just dump the values right on in there without the names but it doesnt.. is that due to the join prints above this loop? `for my $file ( @files ) { ( $basename ) = ( $file =~ /(.+)\.txt$/ ); if ( open( F, $file )) { ( $date, @values ) = <F>; chomp $date; chomp @values; s/.+?,// for ( @values ); # delete the "name," parts print join( ",", $basename, $date, @values ), "\n"; close F; } else { warn "$file: $!\n"; } }` [download]	[reply] [d/l]
Re^3: parsing files and modifying format by graff (Chancellor) on Jul 18, 2008 at 23:41 UTC
I'm trying to work out where your sentence boundaries are -- if you add periods and sentence-initial capitalization in the paragraph text, that would help... each file has multiple records, example I provided was merely one of them If you have trouble figuring out how to parse the input files, start a new thread on that. As I mentioned in my first reply, it should be easy to work out how the file contents are arranged, and to treat them accordingly. it looks like the +? isn't working quite the way I though it would in this one line though.. I have no clue what you're talking about there -- maybe if you show a specific input data set and the resulting output, and explain what's wrong with that output, it will be clear what's wrong. I was thinking I would do an ole call to excel and just dump the values right on in there without the names but it doesnt.. I know nothing about ole, and I see no reason to go there, since I do know that a simple comma-delimited file, with a single line at the top for column headings, works fine for importing to excel. Of course, if the "values" contain commas (and quotes), then it is not simple, because fields containing commas as data need to be quoted, and quotes occurring as data in any field need to be escaped. As a rule, it's fairly common for commas and quotes to appear in text data, but it's probably a lot less likely that your data files contain tabs, so you might consider writing tab-delimited output instead of comma delimited -- that works equally well for importing to excel.	[reply]
Re^2: parsing files and modifying format by grashoper (Monk) on Jul 18, 2008 at 05:27 UTC
actually I missed an important piece of information here, I need the labels,login,searchload etc once and the subsequent values are to become rows of data, goal is to put this data into a database, a single line of csv is going to give me a row that exceeds limits in excel, data files are fairly small 11kb or so but there are a lot of them. The timestamp acts as a row separator of sorts its the time the test was run & it should start a new row in excel. this was a rush job I received late in the day and I am finding it much more challenging than I expected. I really appreciate the help.	[reply]
Re^2: parsing files and modifying format by Anonymous Monk on Mar 17, 2009 at 18:02 UTC
Actually I don't see how I delete the name parts, when I run the above I get both names and values, I really just want one row of names and the rest values.	[reply]
Re: parsing files and modifying format by GrandFather (Saint) on Jul 17, 2008 at 23:37 UTC
Using full stop delimiters on your sentences would aid in understanding your question. Perl is environmentally friendly - it saves trees	[reply]
Re: parsing files and modifying format by oko1 (Deacon) on Jul 18, 2008 at 02:57 UTC
Perhaps I'm missing something, but the original request didn't say anything about doing recursive processing. Assuming that it's required, based on the OP's use of 'File::Find' in his script (which he's not actually using) may not be appropriate. Perl-only solution from the command line (processes all files in the current directory, ignores lines that don't have commas, converts the file content to a single line of CSV): `perl -i -0wpe's/\A[^\n]+\n//;s/\n(?!\Z)/,/g' *` [download] -- Human history becomes more and more a race between education and catastrophe. -- HG Wells	[reply] [d/l]