Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Writing into files and RegExp

by PerlingTheUK (Hermit)
on Mar 10, 2006 at 10:16 UTC ( [id://535650]=perlquestion: print w/replies, xml ) Need Help??

PerlingTheUK has asked for the wisdom of the Perl Monks concerning the following question:

Dearest Monks

Please excuse the very general header but I am not sure as to where the actual problem is and therefore just list my two main suspects.

I have a file with lines that need to be deleted. Each of these lines matches the pattern /^AB.*C$/. The whole code is:

open my $IN, "<", $infile || croak( "Could not open file $infile: $EXTENDED_OS_ERROR" ); open my $OUT, ">", $outfile || croak( "Could not open file $outfile: $EXTENDED_OS_ERROR" ); while ( <$IN> ){ my $line = $_; chomp $line; if ( $line !~ /^AB.*C$/ ){ print $OUT $line . "\n"; } } close $IN; close $OUT;

While this is removing the lines I am looking for, it also seems to not fully write some of the not excluded lines and extends them by several thousand characters shown as '^@' in XEmacs. I am not sure what might cause this or where to start looking.


Cheers,
PerlingTheUK
Updated: I have corrected $line to be used throughout the while loop.

Replies are listed 'Best First'.
Re: Writing into files and RegExp
by Corion (Patriarch) on Mar 10, 2006 at 10:25 UTC

    You leave the line endings at the end of your file. Most likely, there is whitespace at the end of each line that you want to remove. Such things can happen for example when you create a file with Windows-style line-endings ("\r\n") and then process that file with a perl which only expects Unix-style line-endings ("\n"). As a quick measure of testing that, you could check for whitespace at the end of the lines:

    while (<$IN>) { chomp; warn "Whitespace at end of line $.: >$_<" if /\s+$/; ... };

    If that indeed is the case, you can either set the input record separator ($/) to the whitespace sequence, or simply strip off all whitespace.

      Thanks, but that is not the problem. I am using Win32 file with a Win32 Platform. What happens is that a line like
      XYABCDEFG 123451234512341234 1 H 56 65
      becomes
      XYABCDEFG 123451234512341234 1 ^@^@^@^@^@^@^@^@^@^@^@^@...
      with the ^@ extending for about roughly 2000 characters.

      I have tried this several times and the resulting file always has the same size.


      Cheers,
      PerlingTheUK

        You will need to post some data then, which reproduces the problem, as your code does not show anything that I recognize as triggering such behaviour. Please make sure before posting that no sensitive data gets posted and that the posted data actually produces the behaviour you describe. Although I'm no user of Emacs, the ^@ character could be its interpretation of "\0". Which still is weird, but without seeing your input data, it's hard to tell where it comes from. Also, your Perl version (perl -v) might be of interest.

Re: Writing into files and RegExp
by davidrw (Prior) on Mar 10, 2006 at 14:12 UTC
    You can do this as a one-liner, too:
    # old: foo.dat.bak new: foo.dat perl -i.bak -ne 'print unless /^AB.*C$/' foo.dat # OR # old: foo.dat new: foo.dat.fixed perl -ne 'print unless /^AB.*C$/' foo.dat > foo.dat.fixed
Re: Writing into files and RegExp
by wfsp (Abbot) on Mar 10, 2006 at 10:43 UTC
    What is $line?

    Try

    while (my $line = <IN>){
      Sorry that happened in the middle of trying to change from $_ to $line - $_.

      Cheers,
      PerlingTheUK

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://535650]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-03-28 16:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found