Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Help with problem

by Athanasius (Archbishop)
on May 23, 2012 at 07:29 UTC ( [id://971967]=note: print w/replies, xml ) Need Help??


in reply to Help with problem

Update: Added use autodie; to the code.

Hi live4tech,

Three points:

  • You should Choose a Good, Descriptive Title for your posts.
  • It’s not a good idea to try to match on \r\n, as this brings in too many complications (as well as being non-portable). Much better to strip these first, then add them back only when needed (i.e., when printing). See the code below.
  • There is a one-off error in your logic in the final else clause: $linenum is set to 0, but it should be 1, as a line is immediately written to file.

That said, I’m still not clear on how you could be getting files with, e.g., 299,701 rows. The suggestion of Anonymous Monk that it’s because you skip the empty lines doesn’t persuade, as there are (according to your specification) as many blank lines as there are data entry lines; and your logic ignores blank lines anyway.

I offer the following in the hope that it may do what you need:

#!perl use strict; use warnings; use autodie; my $pre = $ARGV[0]; my $max_lines = 300_000; my $linenum = 0; my $filenum = 0; open my $fileout, '>', $pre . '-' . $filenum; while (my $line = <>) { $line =~ s/ \s* $ //x; # remove trailing whitespace (incl. "\ +r\n") if ($line ne '') # ignore blank lines { if ($linenum++ < $max_lines) { print $fileout $line, "\n"; } else { close $fileout; open $fileout, '>', $pre . '-' . ++$filenum; print $fileout $line, "\n"; $linenum = 1; } } } close $fileout;

HTH,

Athanasius <°(((><contra mundum

Replies are listed 'Best First'.
Re^2: Help with problem
by Anonymous Monk on May 23, 2012 at 07:36 UTC

    The suggestion of Anonymous Monk that it’s because you skip the empty lines doesn’t persuade

    And what is your suggestion, why do you think it happens ?

    The code is fairly short and simple, and we only have live4tech's word that there are missing records

    Your reworking of live4tech's code , aside from moving the 300,000-th line into the new file, doesn't change anything else -- if live4tech's original code had records go missing, so would your reworked code (they're virtually identical)

      They're virtually identical

      Except that my pattern matching is (slightly) different.

      As I said, I don’t know why the original code wasn’t working (except for the off-by-one error). At best, the use of \r\n in the pattern match may be a red herring, in which case it will be useful to “eliminate it from our inquiries” (I read too many whodunnits). At worst, it may be introducing some bug which live4tech will find is fixed in my version.

      It will be interesting to find out.

      Athanasius <°(((><contra mundum

        Except that my pattern matching is (slightly) different.

        That doesn't matter. The only possible result from using that pattern would be losing skipping even more records.

        Superstition is not a good reason to introduce code changes :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://971967]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (6)
As of 2024-04-18 04:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found