Remove whitespace between lines

wrkrbeee has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks, at best, I am a novice to Perl, who is seeking to remove large chunks of whitespace (i.e., anything other than a character) between lines of a text file. This site contains a thread dating back to 2000, which seems to address this issue. Some Monks suggest the use of substitution pattern matching, while others suggest WHILE loops. I am attempting to use a while loop in the code below (but remain completely open to other ideas). However, I have no need for user input(i.e., do not need <STDIN>), and am less than clueless for how to use "print if (!/^\s*$/)" to remove the whitespace and write/save the result to the file. Apologize for such a simple problem. Grateful for any ideas. Thank you!

#! /usr/bin/perl -w
use strict;
use warnings;
use lib "c:/strawberry/perl/site/lib";

my $files_dir = 'F:\research\SEC filings 10K and 10Q\Data\Filing Docs\
+2009\Test Data\HTML Clean';
my $write_dir = 'F:\research\SEC filings 10K and 10Q\Data\Filing Docs\
+2009\Test Data\HTML Clean\Non Word Strip';

opendir (my $dir_handle, $files_dir);
while (my $filename = readdir($dir_handle))   {  
  next unless -f $files_dir.'/'.$filename;

  print "Procesing $filename\n";

  open my $fh_in, '<', $files_dir.'/'.$filename
      or die "failed to open '$filename' for read";
  
  open my $fh_out, '>', $write_dir.'/'.$filename
      or die "failed to open '$filename' for write";
  
  my $count=0;   

  while (my $line = <$fh_in>)   {       
    my $text = $line;
    chomp ($text);  
    
    #Strip/remove whitespace between lines of text file;
    while (<STDIN>)
    {
      print if (!/^\s*$/);
    }
                     
    print $fh_out "$text\n";  #Save stripped results; 
         
  } 

    ++$count; 

    print "$count lines read from $filename\n;"  

}
[download]

Comment on Remove whitespace between lines Download Code

Replies are listed 'Best First'.
Re: Remove whitespace between lines by Laurent_R (Canon) on Feb 03, 2015 at 18:05 UTC
Change your while loop: `while (my $line = <$fh_in>) { my $text = $line; chomp ($text); #Strip/remove whitespace between lines of text file; while (<STDIN>) { print if (!/^\s$/); } print $fh_out "$text\n"; #Save stripped results; }` [download] as follows: `while (my $line = <$fh_in>) { print $fh_out $line if (!/^\s$/); }` [download] Update: Sorry, I wrote that above message on my mobile device in the train commuting back to home, the train was arriving near my town, so I copied and pasted the code a bit to hastily. The above code should be: `while (my $line = <$fh_in>) { print $fh_out $line if $line !~ /^\s$/; }` [download] Update 2:* I had not noticed when I wrote the first update above, but poj sent me a CB message proposing a very similar correction. Thanks to poj. Further update: I had also not seen that poj also posted the correction as an answer to your question, so that these updates end up not to be very useful... Je suis Charlie.	[reply] [d/l] [select]
Re^2: Remove whitespace between lines by wrkrbeee (Scribe) on Feb 03, 2015 at 18:41 UTC
Thank you for the simple solution. It gives me the error of "use of uninitialized value $_ in pattern match." Over my head for sure.	[reply]
Re^3: Remove whitespace between lines by poj (Abbot) on Feb 03, 2015 at 18:51 UTC
Try `while (my $line = <$fh_in>) { print $fh_out $line if ($line !~ /^\s*$/); }` [download] poj	[reply] [d/l]
Re^4: Remove whitespace between lines by wrkrbeee (Scribe) on Feb 03, 2015 at 18:55 UTC
Re^5: Remove whitespace between lines by Marshall (Canon) on Feb 04, 2015 at 01:04 UTC
Re^2: Remove whitespace between lines by Anonymous Monk on Feb 03, 2015 at 18:58 UTC
Here the pattern can be further simplified: `print if /\S/;`	[reply] [d/l]
Re: Remove whitespace between lines by Your Mother (Archbishop) on Feb 03, 2015 at 18:13 UTC
This might do what you want. It’s meant to be invoked as a command line tool per file and it sends its results to STDOUT. It’s essentially a one-liner unrolled to a script. You can put in an `-i` flag to edit the file in place but this is risky. Don’t do it unless you have backups and are going to check everything. Save it as, `space-collapser.pl` or whatever you like– Update: I don’t have a lot of experience with WIN on this front… I’m, not sure this will work for you as is. `#!/usr/bin/env perl -0777 -p # -0777, idiom for "slurp" mode. # Strip all trailing spaces including blank lines with spaces. s/[^\n\r\S]+(?=\r?\n)//g; # Reduces all triple or greater line spacing to double spaced lines. s/((?:\r?\n){2})(?:\r?\n)+/$1/g; # -p print at the end of each implicit loop.` [download] `space-collapser.pl file dir-with-files/*`	[reply] [d/l] [select]
Re: Remove whitespace between lines by Anonymous Monk on Feb 03, 2015 at 18:05 UTC
You don't need to use `#! /usr/bin/perl -w` as well as `use warnings;` Here is my version. It looks as though you're on Windows so if you have further problems, try turning the slashes round. #! /usr/bin/perl use strict; use warnings; #use lib "c:/strawberry/perl/site/lib"; my $files_dir = 'data'; my $write_dir = 'data/processed'; opendir (my $dir_handle, $files_dir); while (my $filename = readdir($dir_handle)) { next unless -f $files_dir.'/'.$filename; print "Procesing $filename\n"; open my $fh_in, '<', $files_dir.'/'.$filename or die "failed to open '$filename' for read"; open my $fh_out, '>', $write_dir.'/'.$filename or die "failed to open '$filename' for write"; my $count=0; while (my $line = <$fh_in>) { # print to output file only non-whitespace lines print $fh_out $line unless $line =~ /^\s*\n$/; ++$count; } print "$count lines read from $filename\n"; } [download] The STDIN in the middle was hanging up the entire program. There was also an unnecessary chomp and assignment of $line to variable $text. I have commented out the 'use lib' line as you shouldn't need that either. Try this and let me know if it works.	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.


Perl Monk, Perl Meditation
	PerlMonks