Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

parsing text files continued

by grashoper (Monk)
on Jul 18, 2008 at 17:00 UTC ( [id://698676]=perlquestion: print w/replies, xml ) Need Help??

grashoper has asked for the wisdom of the Perl Monks concerning the following question:

I need to iterate through my file which has a date time value and seperate each date time into a new row, I am not sure how to pull this off, each file has same fields and 15 of them make up a row for 1 record for excel. example of data is in this block. code I have so far is below it..
1/3/2007 12:20:01 AM Login,12.588309 SearchLoad,9.432586 SearchCount,20:0.196329 SearchResults,7.418672 SearchSave,3.616305 SearchDelete,2.066482 SearchDetails,6.873061 ClientAdd,0.784989 CMALoad,1.859894 CMASave,3.249620 CMADelete,0.450952 ClientDelete,0.305768 Logout,0.823402 1/3/2007 12:49:22 AM Login,10.958312 SearchLoad,13.644527 SearchCount,41:0.483233 SearchResults,7.027840 SearchSave,4.222601 SearchDelete,0.305821 SearchDetails,7.443877 ClientAdd,1.552915 CMALoad,1.202711 CMASave,5.285398 CMADelete,0.233119 ClientDelete,0.425521 Logout,0.560862
my code so far..
#!/usr/bin/perl use strict; my $i=0; my $return="\n"; ( @ARGV == 1 and -d $ARGV[0] ) or die "Usage: $0 dir_name > out.csv\n" +; my $dirname = shift; opendir( DIR, $dirname ) or die "$dirname: $!"; my ( $firstfile, @files ) = grep /.+\.txt$/, readdir DIR; open( F, $firstfile ) or die "$firstfile: $!"; my ( @names, @values ); my ( $basename ) = ( $firstfile =~ /(.+)\.txt$/ ); my $date = <F>; while (<F>) { chomp; while ($i++<14) { my ( $n, $v ) = split /,/, $_, 2; push @names, $n; push @values, $v; } continue push @names, $return; my ( $n, $v ) = split /,/, $_, 2; push @names, $n; push @values, $v; } close F; print join( ",", "sourcefile", "date", @names ), "\n"; print join( ",", $basename, $date, @values ), "\n"; for my $file ( @files ) { ( $basename ) = ( $file =~ /(.+)\.txt$/ ); if ( open( F, $file )) { ( $date, @values ) = <F>; chomp $date; chomp @values; s/.+?,// for ( @values ); # delete the "name," parts print join( ",", @values ), "\n"; close F; } else { warn "$file: $!\n"; } }

Replies are listed 'Best First'.
Re: parsing text files continued
by olus (Curate) on Jul 18, 2008 at 17:05 UTC

    How do you need your output to be?

    update assuming you wnat to take everything between two dates and put all into one line and everything separated by commas, I came up with this example:

    use strict; use warnings; my $text; while(<DATA>) { chomp($_); if($_ =~ /\d{1,2}\/\d{1,2}\/\d{4}/) { $text .= "\n" unless $. == 1; $text .= $_; } else { $text .= ",".$_; } } print $text; __DATA__ 1/3/2007 12:20:01 AM Login,12.588309 SearchLoad,9.432586 SearchCount,20:0.196329 SearchResults,7.418672 SearchSave,3.616305 SearchDelete,2.066482 SearchDetails,6.873061 ClientAdd,0.784989 CMALoad,1.859894 CMASave,3.249620 CMADelete,0.450952 ClientDelete,0.305768 Logout,0.823402 1/3/2007 12:49:22 AM Login,10.958312 SearchLoad,13.644527 SearchCount,41:0.483233 SearchResults,7.027840 SearchSave,4.222601 SearchDelete,0.305821 SearchDetails,7.443877 ClientAdd,1.552915 CMALoad,1.202711 CMASave,5.285398 CMADelete,0.233119 ClientDelete,0.425521 Logout,0.560862

    outputs

    1/3/2007 12:20:01 AM,Login,12.588309,SearchLoad,9.432586,SearchCount,2 +0:0.196329,SearchResults,7.418672,SearchSave,3.616305,SearchDelete,2. +066482,SearchDetails,6.873061,ClientAdd,0.784989,CMALoad,1.859894,CMA +Save,3.249620,CMADelete,0.450952,ClientDelete,0.305768,Logout,0.82340 +2 1/3/2007 12:49:22 AM,Login,10.958312,SearchLoad,13.644527,SearchCount, +41:0.483233,SearchResults,7.027840,SearchSave,4.222601,SearchDelete,0 +.305821,SearchDetails,7.443877,ClientAdd,1.552915,CMALoad,1.202711,CM +ASave,5.285398,CMADelete,0.233119,ClientDelete,0.425521,Logout,0.5608 +62
Re: parsing text files continued
by pileofrogs (Priest) on Jul 18, 2008 at 18:00 UTC

    Err.. what was your problem exactly? I'm too lazy to try your code on your data to see where it doesn't work. I assume others are too. (except olus of course...)

    Also, is it 15 lines per record, or 15 files of lines per record? You say "15 of them" and I am easily confused.

    If you want what I think you want, I'd just say go through the file, stick lines into a buffer, when you hit a date, whack the buffer into the format you want and put the new date into the buffer and continue...

    --Pileofrogs

Re: parsing text files continued
by AltBlue (Chaplain) on Jul 19, 2008 at 12:39 UTC
    Assuming you always have the same fields for each records, here's a lazy way to obtain your CSV:
    $ perl -F, -lane '@F % 2 and push @D, [@F] or push @{$D[-1]}, pop @F; +}{ $,=q{,}; print @{$_} for @D' input.txt
    ... running it against your sample input, you will get:
    1/3/2007 12:20:01 AM,12.588309,9.432586,20:0.196329,7.418672,3.616305, +2.066482,6.873061,0.784989,1.859894,3.249620,0.450952,0.305768,0.8234 +02 1/3/2007 12:49:22 AM,10.958312,13.644527,41:0.483233,7.027840,4.222601 +,0.305821,7.443877,1.552915,1.202711,5.285398,0.233119,0.425521,0.560 +862
    I repeat, this is very lazy as it assumes you always have the same fields, in the same order ;-)

    A better/fancier way would be to gather everything in some hashes or something, and then use some "proper" formatter to dump results as CSV. Here's a way to obtain your hash:

    $ perl -MData::Dumper -F, -lane '@F % 2 and ($k)=@F or push @{$D{$k}}, + @F; }{ $D{$_} = {@{$D{$_}}} for keys %D; print Dumper \%D' input.txt
    Results:
    $VAR1 = { '1/3/2007 12:49:22 AM' => { 'ClientAdd' => '1.552915', 'CMALoad' => '1.202711', 'SearchDelete' => '0.305821', 'CMASave' => '5.285398', 'ClientDelete' => '0.425521', 'CMADelete' => '0.233119', 'Login' => '10.958312', 'SearchCount' => '41:0.483233', 'SearchDetails' => '7.443877', 'Logout' => '0.560862', 'SearchResults' => '7.027840', 'SearchSave' => '4.222601', 'SearchLoad' => '13.644527' }, '1/3/2007 12:20:01 AM' => { 'ClientAdd' => '0.784989', 'CMALoad' => '1.859894', 'SearchDelete' => '2.066482', 'CMASave' => '3.249620', 'ClientDelete' => '0.305768', 'CMADelete' => '0.450952', 'Login' => '12.588309', 'SearchCount' => '20:0.196329', 'SearchDetails' => '6.873061', 'Logout' => '0.823402', 'SearchResults' => '7.418672', 'SearchSave' => '3.616305', 'SearchLoad' => '9.432586' } };

    --
    altblue.

      actually I need to make it a little more intelligent than what I currently have as it assumes the fields are all there, and sometimes they are not, ie it might die at some point in my test script say at searchload, and next iteration would start a new row, what happens when I aggregate this is I get "rows" that don't actually line up with the others, munging my data in excel and freaking out my plans to put it into a db, so how would I account for this? I am guessing setting up a hash and testing for a complete row and if not a complete row then dump the data and move on to the next one, but I am just not sure how to do this as I am not that skilled, since the rows would kind of have a fixed field width testing for value in each field might help with this but again unsure how to do it. Thanks altBlue, I tried running your file but it doesn't run I get the following errors..string found where operator expected at filename.pl at end of line do you need to predeclare lane? bareword found where operator expected near input (missing operator before input?) string found where operator expect at filename.pl line 7 near "Logout" Login do you need to predeclare logout, syntax error at filename.pl line 1 next token ??? execution of filename.pl aborted due to compilation errors. do these switches not work in windows? I guess the lane statement are command line switches? L for label processing This option turns on line-ending processing. It can be used to set the output line terminator variable ($/) by specifying an octal value. See "Example: Using the -0 option" for an example of using octal numbers. If no octal number is specified, the output line terminator is set equal to the input line terminator (such as $\ = $/;). a -for This option must be used in conjunction with either the -n or -p option. Using the -a option will automatically feed input lines to the split function. The results of the split are placed into the @F variable. n- This option places a loop around your script. It will automatically read a line from the diamond operator and then execute the script. It is most often used with the -e option e- The option lets you specify a single line of code on the command line. This line of code will be executed in lieu of a script file. You can use multiple -e options to create a multiple line program-although given the probability of a typing mistake, I'd create a script file instead. Semi-colons must be used to end Perl statements just like a normal script.

        Hm, at first glance your reply sounds kinda messy, so, I guess providing some more details could help, still trying to avoid doing your homework at the same time (you know, the rules) ;-)

        First, let's drop some lines from your second record, so we could see what happens when fields are missing:

        1/3/2007 12:20:01 AM
        Login,12.588309
        SearchLoad,9.432586
        SearchCount,20:0.196329
        SearchResults,7.418672
        SearchSave,3.616305
        SearchDelete,2.066482
        SearchDetails,6.873061
        ClientAdd,0.784989
        CMALoad,1.859894
        CMASave,3.249620
        CMADelete,0.450952
        ClientDelete,0.305768
        Logout,0.823402
        1/3/2007 12:49:22 AM
        Login,10.958312
        SearchCount,41:0.483233
        

        Now let's rewrite my previous HoH solution and move the "key" (time stamp) *inside* the hash, using an AoH:

        $ perl -MData::Dumper -F, -lane '@F % 2 and push @D, {q{Stamp},@F} or +$D[-1] = { %{$D[-1]}, @F } }{ print Dumper @D' input.txt
        $VAR1 = {
                  'Stamp' => '1/3/2007 12:20:01 AM',
                  'ClientAdd' => '0.784989',
                  'CMALoad' => '1.859894',
                  'SearchDelete' => '2.066482',
                  'CMASave' => '3.249620',
                  'CMADelete' => '0.450952',
                  'Login' => '12.588309',
                  'ClientDelete' => '0.305768',
                  'SearchCount' => '20:0.196329',
                  'SearchDetails' => '6.873061',
                  'Logout' => '0.823402',
                  'SearchResults' => '7.418672',
                  'SearchSave' => '3.616305',
                  'SearchLoad' => '9.432586'
                };
        $VAR2 = {
                  'Stamp' => '1/3/2007 12:49:22 AM',
                  'Login' => '10.958312',
                  'SearchCount' => '41:0.483233'
                };
        

        And now, let's print the fields we need from this data as CSV.

        $ perl -F, -lane '@F % 2 and push @D, {q{Stamp},@F} or $D[-1] = { %{$D +[-1]}, @F } }{ $,=","; print @{$_}{qw(Stamp Login SearchResults Searc +hLoad SearchCount Logout)} for @D' input.txt
        1/3/2007 12:20:01 AM,12.588309,7.418672,9.432586,20:0.196329,0.823402
        1/3/2007 12:49:22 AM,10.958312,,,41:0.483233,
        

        As you may notice, the values that are missing from any records generate empty fields, which should be just fine for CSV

        Obviously, this lazy toy will trigger "undefined" warnings, but I'm sure you'll know how to handle them in your real/production code. ;-)

        My apologies if this code looked too messy for you, I'll try adding some spoilers...

        Finally, I have to warn you again: DON'T use this code in production, this is just a proof of concept :)

        'HTH

        --
        altblue.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://698676]
Approved by olus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2024-04-16 16:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found