Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Remove whitespace between lines

by wrkrbeee (Scribe)
on Feb 03, 2015 at 17:11 UTC ( [id://1115411]=perlquestion: print w/replies, xml ) Need Help??

wrkrbeee has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks, at best, I am a novice to Perl, who is seeking to remove large chunks of whitespace (i.e., anything other than a character) between lines of a text file. This site contains a thread dating back to 2000, which seems to address this issue. Some Monks suggest the use of substitution pattern matching, while others suggest WHILE loops. I am attempting to use a while loop in the code below (but remain completely open to other ideas). However, I have no need for user input(i.e., do not need <STDIN>), and am less than clueless for how to use "print if (!/^\s*$/)" to remove the whitespace and write/save the result to the file. Apologize for such a simple problem. Grateful for any ideas. Thank you!

#! /usr/bin/perl -w use strict; use warnings; use lib "c:/strawberry/perl/site/lib"; my $files_dir = 'F:\research\SEC filings 10K and 10Q\Data\Filing Docs\ +2009\Test Data\HTML Clean'; my $write_dir = 'F:\research\SEC filings 10K and 10Q\Data\Filing Docs\ +2009\Test Data\HTML Clean\Non Word Strip'; opendir (my $dir_handle, $files_dir); while (my $filename = readdir($dir_handle)) { next unless -f $files_dir.'/'.$filename; print "Procesing $filename\n"; open my $fh_in, '<', $files_dir.'/'.$filename or die "failed to open '$filename' for read"; open my $fh_out, '>', $write_dir.'/'.$filename or die "failed to open '$filename' for write"; my $count=0; while (my $line = <$fh_in>) { my $text = $line; chomp ($text); #Strip/remove whitespace between lines of text file; while (<STDIN>) { print if (!/^\s*$/); } print $fh_out "$text\n"; #Save stripped results; } ++$count; print "$count lines read from $filename\n;" }

Replies are listed 'Best First'.
Re: Remove whitespace between lines
by Laurent_R (Canon) on Feb 03, 2015 at 18:05 UTC
    Change your while loop:
    while (my $line = <$fh_in>) { my $text = $line; chomp ($text); #Strip/remove whitespace between lines of text file; while (<STDIN>) { print if (!/^\s*$/); } print $fh_out "$text\n"; #Save stripped results; }
    as follows:
    while (my $line = <$fh_in>) { print $fh_out $line if (!/^\s*$/); }
    Update: Sorry, I wrote that above message on my mobile device in the train commuting back to home, the train was arriving near my town, so I copied and pasted the code a bit to hastily. The above code should be:
    while (my $line = <$fh_in>) { print $fh_out $line if $line !~ /^\s*$/; }
    Update 2: I had not noticed when I wrote the first update above, but poj sent me a CB message proposing a very similar correction. Thanks to poj. Further update: I had also not seen that poj also posted the correction as an answer to your question, so that these updates end up not to be very useful...

    Je suis Charlie.
      Thank you for the simple solution. It gives me the error of "use of uninitialized value $_ in pattern match." Over my head for sure.
        Try
        while (my $line = <$fh_in>) { print $fh_out $line if ($line !~ /^\s*$/); }
        poj

      Here the pattern can be further simplified: print if /\S/;

Re: Remove whitespace between lines
by Your Mother (Archbishop) on Feb 03, 2015 at 18:13 UTC

    This might do what you want. It’s meant to be invoked as a command line tool per file and it sends its results to STDOUT. It’s essentially a one-liner unrolled to a script. You can put in an -i flag to edit the file in place but this is risky. Don’t do it unless you have backups and are going to check everything. Save it as, space-collapser.pl or whatever you like–

    Update: I don’t have a lot of experience with WIN on this front… I’m, not sure this will work for you as is.

    #!/usr/bin/env perl -0777 -p # -0777, idiom for "slurp" mode. # Strip all trailing spaces including blank lines with spaces. s/[^\n\r\S]+(?=\r?\n)//g; # Reduces all triple or greater line spacing to double spaced lines. s/((?:\r?\n){2})(?:\r?\n)+/$1/g; # -p print at the end of each implicit loop.
    space-collapser.pl file dir-with-files/*
Re: Remove whitespace between lines
by Anonymous Monk on Feb 03, 2015 at 18:05 UTC
    You don't need to use #! /usr/bin/perl -w as well as use warnings; Here is my version. It looks as though you're on Windows so if you have further problems, try turning the slashes round.
    #! /usr/bin/perl use strict; use warnings; #use lib "c:/strawberry/perl/site/lib"; my $files_dir = 'data'; my $write_dir = 'data/processed'; opendir (my $dir_handle, $files_dir); while (my $filename = readdir($dir_handle)) { next unless -f $files_dir.'/'.$filename; print "Procesing $filename\n"; open my $fh_in, '<', $files_dir.'/'.$filename or die "failed to open '$filename' for read"; open my $fh_out, '>', $write_dir.'/'.$filename or die "failed to open '$filename' for write"; my $count=0; while (my $line = <$fh_in>) { # print to output file only non-whitespace lines print $fh_out $line unless $line =~ /^\s*\n$/; ++$count; } print "$count lines read from $filename\n"; }
    The STDIN in the middle was hanging up the entire program. There was also an unnecessary chomp and assignment of $line to variable $text. I have commented out the 'use lib' line as you shouldn't need that either. Try this and let me know if it works.
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1115411]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-20 10:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found