Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

parsing files and modifying format

by grashoper (Monk)
on Jul 17, 2008 at 21:53 UTC ( [id://698454]=perlquestion: print w/replies, xml ) Need Help??

grashoper has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse a directory of files which are comma delimited each field value pair contains a , and a newline seperator is at the end of the field value pair, I would like to readin the entire directory of files and modify the file itself to be a csv file without the new lines I would like to replace the newlines with comma's basically this should be fairly easy to do, but I am having trouble figuring out how to do it. all I really need is to change the file so that I have label,value, so I can import it into excel. here is what I have so far.. oh and the stuff in the bottom is a an example of the source files format..
#!/usr/local/bin/perl use File::Find; $dirname=argv[]; my ($maxwidth)=0; my (%cmTime,%hMtime,%oMtime); my ($file,$base,$fmt); opendir(DIR, $dirname) or die "can't opendir $dirname: $!"; while (defined($file = readdir(DIR))) { next unless ($base)=$file=~/(.+)\.txt$/; open (fh, ">", $file) or die "Can't openfile $file: $!"; while (<fh>) { chomp; ($timestamp,@fields)=split/,/; } close (fh) or die "unable to close $file: $!"; $actioncode=$timestamp; if ($actioncode='Login') { print "found login"; } print ("succesfully opened file"); print("$file \n"); } closedir(DIR);
/8/2008 1:37:12 AM Login,27.742460 SearchLoad,21.855703 SearchCount,45:43.520663 SearchResults,69.364964 SearchSave,107.611750 SearchDelete,29.317471 SearchDetails,10.493312 TaxLoad,24.488549 TaxResults,197:15.273238 TaxDetails,10.222830 ClientAdd,10.763459 CMALoad,7.398715 CMASave,16.196349 CMADelete,3.009030 ClientDelete,6.442495 Logout,0.873738

Replies are listed 'Best First'.
Re: parsing files and modifying format
by olus (Curate) on Jul 17, 2008 at 22:18 UTC

    perhaps you could work with something from the command line like

    for i in `find . *.txt`; do perl -pi -e 's/\n/,/' $i; done;

    update or maybe a Perl only solution. This one uses recursion to traverse subdirs.

    #!/usr/bin/perl use strict; use warnings; my $dir = "."; lista_dirs($dir, $dir); sub lista_dirs { my ($dir, $dirname) = @_; my (@dirs, @files); opendir DIR, "$dir"; my @dircontent = grep { /[^\.]/ } readdir(DIR); closedir DIR; @dirs = (); @files = (); foreach(@dircontent) { if(-d "$dir/".$_) { push @dirs, $_; } else { push @files, $_ if($_ =~ /\.txt$/); } } foreach my $d(@dirs) { lista_dirs("$dir/".$d, $d); } # IMPORTANT NOTE: if your files aren't too big, you can read every +thing at once as shown in the code below. # if note, you can open +the input file for reading and a temp file for wirting. Read the file + line by line, # replace \n with the , and write to the temp file +. Close both files when you're done, and then 'rename' # the temp file to the input filename foreach my $file (@files) { $/ = ''; open FILE, "$dir/$file"; my $content = <FILE>; close FILE; $content =~ tr/\n/,/; open FILE, ">$dir/$file"; print FILE $content; close FILE; } return; } <p><b>update 2</b> After reading the other replies I realized I misrea +d the question, and there was no need for recursion, so the solution +came to be more complex than it needed to be. I was really in the nee +d to get some sleep.</p>
Re: parsing files and modifying format
by graff (Chancellor) on Jul 18, 2008 at 04:05 UTC
    Perhaps you haven't explained your goal very well, but in any case I'm sure your code as posted does not do what you want. If I understand it correctly:
    • There is one directory full of files (so no need to do recursion over subdirectories).
    • Each file contains a relatively small number of lines.
    • Each line is a "name,value" pair (except the first line, which is just a timestamp).
    • You want all the lines in a given file to joined into a single line, by converting line breaks into commas, so each file ends up being just "name,value,name,value,..."
    • When that processing is done, you want to load all the file data into an excel spreadsheet, with each original source file being one row.

    If that's the basic plan, I wouldn't bother with writing modified versions of every original file. Just output a single stream of data (store it all in a single file), having one line of text for each input file. Something like this:

    #!/usr/bin/perl use strict; # you really should! ( @ARGV == 1 and -d $ARGV[0] ) or die "Usage: $0 dir_name > out.csv\n" +; my $dirname = shift; opendir( DIR, $dirname ) or die "$dirname: $!"; my @files = grep /.+\.txt$/, readdir DIR; for my $file ( @files ) { my ( $basename ) = ( $file =~ /(.+)\.txt$/ ); if ( open( F, $file )) { my ( $date, @lines ) = <F>; chomp $date; chomp @lines; print join( ",", "sourcefile", $basename, "date", $date, @line +s ), "\n"; close F; } else { warn "$file: $!\n"; } }
    (I'm assuming you want to keep track of the original file names where all of the output rows come from, as well as all the time-stamps.)

    Of course, if it turns out that all the files have the same number of lines, and the same set of names in their various "name,value" pairs, and these names are always in the same order, a better input for excel would have a single line at the top with the names, and then each following line would just have the values in the proper order:

    #!/usr/bin/perl use strict; ( @ARGV == 1 and -d $ARGV[0] ) or die "Usage: $0 dir_name > out.csv\n" +; my $dirname = shift; opendir( DIR, $dirname ) or die "$dirname: $!"; my ( $firstfile, @files ) = grep /.+\.txt$/, readdir DIR; open( F, $firstfile ) or die "$firstfile: $!"; my ( @names, @values ); my ( $basename ) = ( $firstfile =~ /(.+)\.txt$/ ); my $date = <F>; while (<F>) { chomp; my ( $n, $v ) = split /,/, $_, 2; push @names, $n; push @values, $v; } close F; print join( ",", "sourcefile", "date", @names ), "\n"; print join( ",", $basename, $date, @values ), "\n"; for my $file ( @files ) { ( $basename ) = ( $file =~ /(.+)\.txt$/ ); if ( open( F, $file )) { ( $date, @values ) = <F>; chomp $date; chomp @values; s/.+?,// for ( @values ); # delete the "name," parts print join( ",", $basename, $date, @values ), "\n"; close F; } else { warn "$file: $!\n"; } }
    (Not tested)

    If you're not sure whether all the files have the same set of names in the same order, it would be easy enough to check for that before you try to create the single data stream for importing to excel.

      graf you are correct sample2 is what I needed, except each file has multiple records, example I provided was merely one of them,it looks like the +? isn't working quite the way I though it would in this one line though..not sure why I expected the names, would not get printed but instead they do appear in my csv, I was thinking I would do an ole call to excel and just dump the values right on in there without the names but it doesnt.. is that due to the join prints above this loop?
      for my $file ( @files ) { ( $basename ) = ( $file =~ /(.+)\.txt$/ ); if ( open( F, $file )) { ( $date, @values ) = <F>; chomp $date; chomp @values; s/.+?,// for ( @values ); # delete the "name," parts print join( ",", $basename, $date, @values ), "\n"; close F; } else { warn "$file: $!\n"; } }
        I'm trying to work out where your sentence boundaries are -- if you add periods and sentence-initial capitalization in the paragraph text, that would help...

        each file has multiple records, example I provided was merely one of them

        If you have trouble figuring out how to parse the input files, start a new thread on that. As I mentioned in my first reply, it should be easy to work out how the file contents are arranged, and to treat them accordingly.

        it looks like the +? isn't working quite the way I though it would in this one line though..

        I have no clue what you're talking about there -- maybe if you show a specific input data set and the resulting output, and explain what's wrong with that output, it will be clear what's wrong.

        I was thinking I would do an ole call to excel and just dump the values right on in there without the names but it doesnt..

        I know nothing about ole, and I see no reason to go there, since I do know that a simple comma-delimited file, with a single line at the top for column headings, works fine for importing to excel.

        Of course, if the "values" contain commas (and quotes), then it is not simple, because fields containing commas as data need to be quoted, and quotes occurring as data in any field need to be escaped.

        As a rule, it's fairly common for commas and quotes to appear in text data, but it's probably a lot less likely that your data files contain tabs, so you might consider writing tab-delimited output instead of comma delimited -- that works equally well for importing to excel.

      actually I missed an important piece of information here, I need the labels,login,searchload etc once and the subsequent values are to become rows of data, goal is to put this data into a database, a single line of csv is going to give me a row that exceeds limits in excel, data files are fairly small 11kb or so but there are a lot of them. The timestamp acts as a row separator of sorts its the time the test was run & it should start a new row in excel. this was a rush job I received late in the day and I am finding it much more challenging than I expected. I really appreciate the help.
      Actually I don't see how I delete the name parts, when I run the above I get both names and values, I really just want one row of names and the rest values.
Re: parsing files and modifying format
by GrandFather (Saint) on Jul 17, 2008 at 23:37 UTC

    Using full stop delimiters on your sentences would aid in understanding your question.


    Perl is environmentally friendly - it saves trees
Re: parsing files and modifying format
by oko1 (Deacon) on Jul 18, 2008 at 02:57 UTC

    Perhaps I'm missing something, but the original request didn't say anything about doing recursive processing. Assuming that it's required, based on the OP's use of 'File::Find' in his script (which he's not actually using) may not be appropriate.

    Perl-only solution from the command line (processes all files in the current directory, ignores lines that don't have commas, converts the file content to a single line of CSV):

    perl -i -0wpe's/\A[^\n]+\n//;s/\n(?!\Z)/,/g' *
    
    -- 
    Human history becomes more and more a race between education and catastrophe. -- HG Wells
    

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://698454]
Approved by Joost
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (1)
As of 2024-04-25 00:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found