http://qs321.pair.com?node_id=724009

valavanp has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I wrote the following perl script to convert the CSV into HTML files. I need to have a check in the csv files i.e. if it contains more than 5 rows it should write the content in another HTML file and it continues till the end of the file. I am getting the ouput in different files but the problem is only the last record is getting added to the file not all records. I am enclosing the code snippet below. It is just a sample, whereas i have 500000 records to split each files of 10000 records. Am I missing a point here. Could anyone correct me where i am wrong? Thanks for your help.
#!c:/perl/bin/perl use strict; use warnings; open(FH,"C:/csv_in/test.csv") || die("cannot open the file $!"); my $header="<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Tranistion//E +N\"><HTML><HEAD><TITLE>HTML OUTPUT</TITLE></HEAD><BODY BGCOLOR=\"#eee +eee\"><TABLE BORDER=1><CAPTION><FONT FACE=\"HELVETICA, ARIAL\"><B><P> +Test HTML </P></B></FONT></CAPTION><TR><TH><FONT FACE=\"HELVETICA, AR +IAL\">category</FONT></TH><TH><FONT FACE=\"HELVETICA, ARIAL\"></FONT> +Name</TH><TH><FONT FACE=\"HELVETICA, ARIAL\">DOB</FONT></TH><TH><FONT + FACE=\"HELVETICA, ARIAL\">SEX</FONT></TH></TR>"; my $count=0; while(my $line=<FH>){ if($line=~/M|F$/){ $count++; if($count <= 5){ open(OUT,">C:/html_out/output_1.html") or die("cannot open + $!"); print OUT $header; my @fields=split /,/, $line; print OUT ("<TR>", (map {qq{<TD><FONT FACE=\"HELVETICA, AR +IAL\"><CENTER>$_</CENTER></FONT></TD>}} @fields), "</TR>"); }elsif($count > 5 && $count <= 10){ open(OUT1,">C:/html_out/output_2.html") or die("cannot ope +n $!"); print OUT1 $header; my @fields=split /,/, $line; print OUT1 ("<TR>", (map {qq{<TD><FONT FACE=\"HELVETICA, A +RIAL\"><CENTER>$_</CENTER></FONT></TD>}} @fields), "</TR>"); }elsif($count > 10 && $count <= 15){ open(OUT2,">>C:/html_out/output_3.html") or die("cannot op +en $!"); print OUT2 $header; my @fields=split /,/, $line; print OUT2 ("<TR>", (map {qq{<TD><FONT FACE=\"HELVETICA, A +RIAL\"><CENTER>$_</CENTER></FONT></TD>}} @fields), "</TR>"); } close OUT; close OUT1; close OUT2; } #if($count > 5){ } close FH; test.csv -------- Name DOB Sex John 1/1/1980 M Kathy 2/2/2987 F Mike 3/3/1999 M Stella 4/4/1985 F Robert 5/5/1991 M Britto 6/6/1993 M Katherine 7/7/1991 F vicky 8/8/1992 M charles 9/9/1992 M stephen 10/10/1968 M Denzel 11/11/1996 M
  • Comment on Only last record is written to the output file instead of all records
  • Download Code

Replies are listed 'Best First'.
Re: Only last record is written to the output file instead of all records
by ikegami (Patriarch) on Nov 17, 2008 at 06:31 UTC
    You keep overwriting the file. Simple solution: use >> instead of >. Better solution: open the files before the loop.
    open ...; open ...; open ...; while (...) { ... } close ...; close ...; close ...;
      If i have the files open before the loop the files will be created automatically without satisfying the condition. For e.g if i have only 5 records, it will create 3 files. All the 5 records will be written to the first file and leaving other files blank without any content.

        Test for the condition prior to entering the loop, OR, open within the loop, but on each iteration check to see if the file is already open. Easy way to do that would be to use a lexical filehandle, scoped to just outside the loop. Here's an example:

        { my $fh; while ( ....condition.... ) { unless( defined( $fh ) ) { open $fh, '>', $filename or die $!; } # ....do your stuff... } close $fh or die $!; }

        Dave

Re: Only last record is written to the output file instead of all records
by graff (Chancellor) on Nov 17, 2008 at 13:38 UTC
    You talk about a "csv file" and use "/,/" as your split expression, but the data you posted is whitespace-delimited. Apart from that, shouldn't you be using a CSV module, or do you really have perfect confidence that your input data will never have things like quoted fields with embedded delimiters?

    Your "if ... elsif ... elsif ..." structure is really not sustainable if you ever need to adapt to input files of arbitrary length. Personally, given a stable (but potentially changeable) csv file (or other database-like source), I would be inclined to use CGI and have a process that delivers a user-specified quantity of data rows starting at a user-specified point, with a button to move back and forth by pages -- similar to what is done here at the Monastery (Nodes You Wrote Perl Monks User Search being a very good example).

    But if you just want to generate a set of static html files from your data, I'd do that like this:

    #!/usr/bin/perl use strict; use warnings; my $input = ( @ARGV and -f $ARGV[0] ) ? shift : "C:/csv_in/test.csv"; open( $ifh, "<", $input ) or die "$input: open failed: $!\n"; my $out_num = my $out_count = 0; my $ofh; while ( <$ifh> ) { if ( ! defined( $ofh ) or $out_count == 5 ) { if ( $ofh ) { # print closing html stuff (page trailer, etc)... close $ofh; } my $out_name = sprintf( "C:/html_out/output_%d.html", ++$out_n +um ); open( $ofh, ">", $out_name ) or die "$out_name: cannot open: $ +!\n"; # print opening html stuff... $out_count = 0; } # extract fields from data row and stuff it into html... print $ofh ... $out_count++; } # print closing html content to current $ofh ...
    That should avoid the problem you were having with so much of the data being absent from the output files.

    (Update: I had forgotten that "Nodes You Wrote" was a personal tweak to one's personal "nodelet" set -- which is easy to set up, just follow the original link above -- whereas "Perl Monks User Search" is the direct link to the facility.)

Re: Only last record is written to the output file instead of all records
by poolpi (Hermit) on Nov 17, 2008 at 15:35 UTC

    If you work with CSV files, you can use a dedicated module.
    For example Text::CSV, Text::CSV_XS, DBD::AnyData or DBD::CSV

    #!/usr/bin/perl use strict; use warnings; use DBI; my $dbh = DBI->connect('dbi:AnyData(RaiseError=>1):'); # The default is to treat the first line # of the file as the list of column names # If you just want to select the women $dbh->func( 'people', 'CSV', '/home/user1/file.csv', { sql => "SELECT name FROM people WHERE genre='F'" }, 'ad_import' +); # Print the result as a HTML table print $dbh->func( 'people', 'HTMLtable', 'ad_export' ); # or save it in a file $dbh->func( 'people', 'HTMLtable', '/home/user/mytab1.html', 'ad_expor +t' );
    OUTPUT: <table bgcolor="white" border="1"> <tr bgcolor="#c0c0c0"><th>name</th></tr> <tr><td>Kathy</td></tr> <tr><td>Stella</td></tr> <tr><td>Katherine</td></tr> </table>

    hth,
    PooLpi

    'Ebry haffa hoe hab im tik a bush'. Jamaican proverb