Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Help with problem

by aaron_baugher (Curate)
on May 23, 2012 at 13:14 UTC ( #972034=note: print w/replies, xml ) Need Help??


in reply to Help with problem

A wild guess: Since you reopen FILEOUT to a new file without closing it first, maybe there's an issue with some buffered data ending up in the wrong file? I think perl is usually smart about closing file descriptors in cases like that, but I don't know if you can always count on it. Perhaps you should close it before reopening it.

Aside from that, I'd have to agree with moritz: some issue with testing. Maybe whatever you're using to count the lines in your resulting files doesn't have exactly the same definition for "line" that perl does. Incidentally, there is a perfectly good *nix utility for this kind of thing:

grep -v ^$ inputfile | split -d -l 300000 - outputfile

Aaron B.
Available for small or large Perl jobs; see my home node.

Replies are listed 'Best First'.
Re^2: Help with problem
by Marshall (Canon) on May 24, 2012 at 15:49 UTC
    A wild guess: Since you reopen FILEOUT to a new file without closing it first, maybe there's an issue with some buffered data ending up in the wrong file? I think perl is usually smart about closing file descriptors in cases like that, but I don't know if you can always count on it. Perhaps you should close it before reopening it.

    reopening a file handle to a different file is just fine. I/O Buffers do get flushed to the disk and the file is closed in the normal way. You can do an explicit close(), but it is not necessary.

Re^2: Help with problem (CUTTING TEXT FILES)
by live4tech (Sexton) on May 23, 2012 at 17:17 UTC

    Aaron, that grep line looks promising (short and 'simple' - I like that!). I do not really understand it, but I want to, so I will review grep in perldocs and elsewhere and hopefully be able to decipher the line so I will be able to adapt it to my needs in the future. Thanks so much!

    To everyone else who has commented, thank you too! The logic in the if statements in the original code is correct.

    I am going to try the simpler and prettier code written by Athanasius. BTW - I know the row count is correct because I looked at it in a few ways and checked a number of lines at the beginning, middle and end of several cut files against the original and these were right on. I will update the Monastery after I try the new code.

    One last note - I mentioned I was working through the "Camel" book; well actually its the "Llama" book... sorry Perlers.

      Thanks! Here's that command line explained bit by bit:

      grep grep for -v lines that DO NOT match ^$ an empty line (begin and end with nothing between) inputfile in the file "inputfile" | pipe the results to split the split program, which divides up a file -d naming the output files with digits -l 300000 and putting 300000 lines in each - getting the input from stdin (the pipe) outputfile and naming the output files starting with outputfile (fol +lowed by digits)

      Aaron B.
      Available for small or large Perl jobs; see my home node.

        ack -v "^\s*$" | split ...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://972034]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2023-02-05 02:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer not to run the latest version of Perl because:







    Results (31 votes). Check out past polls.

    Notices?