Update: Added use autodie; to the code.
Hi live4tech,
Three points:
-
You should Choose a Good, Descriptive Title for your posts.
-
It’s not a good idea to try to match on \r\n, as this brings in too many complications (as well as being non-portable). Much better to strip these first, then add them back only when needed (i.e., when printing). See the code below.
-
There is a one-off error in your logic in the final else clause: $linenum is set to 0, but it should be 1, as a line is immediately written to file.
That said, I’m still not clear on how you could be getting files with, e.g., 299,701 rows. The suggestion of Anonymous Monk that it’s because you skip the empty lines doesn’t persuade, as there are (according to your specification) as many blank lines as there are data entry lines; and your logic ignores blank lines anyway.
I offer the following in the hope that it may do what you need:
#!perl
use strict;
use warnings;
use autodie;
my $pre = $ARGV[0];
my $max_lines = 300_000;
my $linenum = 0;
my $filenum = 0;
open my $fileout, '>', $pre . '-' . $filenum;
while (my $line = <>)
{
$line =~ s/ \s* $ //x; # remove trailing whitespace (incl. "\
+r\n")
if ($line ne '') # ignore blank lines
{
if ($linenum++ < $max_lines)
{
print $fileout $line, "\n";
}
else
{
close $fileout;
open $fileout, '>', $pre . '-' . ++$filenum;
print $fileout $line, "\n";
$linenum = 1;
}
}
}
close $fileout;
HTH,
Athanasius <°(((>< contra mundum
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|