http://qs321.pair.com?node_id=971967


in reply to Help with problem

Update: Added use autodie; to the code.

Hi live4tech,

Three points:

That said, I’m still not clear on how you could be getting files with, e.g., 299,701 rows. The suggestion of Anonymous Monk that it’s because you skip the empty lines doesn’t persuade, as there are (according to your specification) as many blank lines as there are data entry lines; and your logic ignores blank lines anyway.

I offer the following in the hope that it may do what you need:

#!perl use strict; use warnings; use autodie; my $pre = $ARGV[0]; my $max_lines = 300_000; my $linenum = 0; my $filenum = 0; open my $fileout, '>', $pre . '-' . $filenum; while (my $line = <>) { $line =~ s/ \s* $ //x; # remove trailing whitespace (incl. "\ +r\n") if ($line ne '') # ignore blank lines { if ($linenum++ < $max_lines) { print $fileout $line, "\n"; } else { close $fileout; open $fileout, '>', $pre . '-' . ++$filenum; print $fileout $line, "\n"; $linenum = 1; } } } close $fileout;

HTH,

Athanasius <°(((><contra mundum

Replies are listed 'Best First'.
Re^2: Help with problem
by Anonymous Monk on May 23, 2012 at 07:36 UTC

    The suggestion of Anonymous Monk that it’s because you skip the empty lines doesn’t persuade

    And what is your suggestion, why do you think it happens ?

    The code is fairly short and simple, and we only have live4tech's word that there are missing records

    Your reworking of live4tech's code , aside from moving the 300,000-th line into the new file, doesn't change anything else -- if live4tech's original code had records go missing, so would your reworked code (they're virtually identical)

      They're virtually identical

      Except that my pattern matching is (slightly) different.

      As I said, I don’t know why the original code wasn’t working (except for the off-by-one error). At best, the use of \r\n in the pattern match may be a red herring, in which case it will be useful to “eliminate it from our inquiries” (I read too many whodunnits). At worst, it may be introducing some bug which live4tech will find is fixed in my version.

      It will be interesting to find out.

      Athanasius <°(((><contra mundum

        Except that my pattern matching is (slightly) different.

        That doesn't matter. The only possible result from using that pattern would be losing skipping even more records.

        Superstition is not a good reason to introduce code changes :)