Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

chomp() problems

by Anonymous Monk
on Sep 24, 2002 at 17:23 UTC ( #200426=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a line that consists of a series of fields delimited by |'s ...
field1|field2|field3|field4\r\n

This file is generated as an export from MS access. Now, my problem is that I would normally (for the sake of saftey) use chomp() to remove the \r\n:
chomp($line); chomp($line);

however this is producing NO CHANGE in the file. however, doing this works fine:
$line =~ tr/\r\n/\0\0/;

My question is ... why?

Replies are listed 'Best First'.
Re: chomp() problems
by SparkeyG (Curate) on Sep 24, 2002 at 17:31 UTC
    chomp will only remove the systems $/ ($INPUT_RECORD_SEPARATOR). In most Unix it is just \n, in DOS it's \r\n.
    I guess you could fix the problem by:
    { local $/ = "\r\n"; chomp ($line); }
    Edited to correct typo, and again edited to correct a typo and again edited to fix a type. A sick infant does wonders to your sleep habits, and therfore your typing skils ;) Note to self, do not post anything after cleaning up baby vomit. ;)
      In most Unix it is just \n, in DOS it's \r\n.
      No that's not correct. On DOS and Win32, it's also "\n". You see, the trick is that when reading from a text file, "\015\012" (AKA CRLF, "\r\n") is converted into a bare "\012" (LF, "\n"). Therefore, chomp() doesn't have to remove "\r\n", because there commonly will be, should be, only a bare "\n". And as chomp and $/ only use fixed strings not regexes for their workings, you can't have it both ways at the same time.

      And that, boys and girls, is why it doesn't work here. Access does store line endings as CRLF pairs. And that isn't very Perl compatible. Therefore, when reading data from Access in a Perl script, you should always turn CRLF into "\n", and vice versa when storing data back into Access.

      Oops. You mean $/ (INPUT_RECORD_SEPARATOR)
      instead of $\ (OUTPUT_RECORD_SEPARATOR)

      chomp works on $/

      local $\ = '\r\n';
      You did of course mean to use double quotes there.

      Makeshifts last the longest.

      One correction:

      s/in DOS it's \n\r/in DOS it's \r\n/

      Sure, blame it on the baby! :-)

      Wally Hartshorn

Re: chomp() problems
by fruiture (Curate) on Sep 24, 2002 at 17:39 UTC

    As `perldoc -f chomp` tells us, chomp <cite>removes any trailing string that corresponds to the current value of "$/"</cite>; so look at the value of $/. Because it's quite tricky how the "end of line" portability problem was solved (`man perlport`), i'd recommend you to use the explizit character escapes \015 and \012 instead of \r and \n:

    { local $/="\015\012";chomp }
    --
    http://fruiture.de

      ++fruiture! Twice, if I could.

      I can't believe how fortified using \r\n is in some minds when dealing with foreign sytems' line terminators. In the last few days, this is the fourth post on that topic... It seems as if I stumbled into a crusade against \r\n (which, across systems, ah.., isn't).

      Brothers and Sisters, this is harmful! Oh well, I might overstate this, but this does spoil portability! It was Aristotle, who asked me to beat him to it, just a short while ago... ;)

      In this case, if you ported local $/ = "\r\n" to an EBCDIC-US system you'd chomp on CR followed by chr(37) ('#' in EBCDIC)! You need to obey the origins encoding, and for DOS that means that line endings are \015\012 and not your systems' \r\n.

      So long,
      Flexx

      PS: SparkeyG, best wishes for your baby!
      $happy_baby = not reverse(@food) and sleep($calmly) # ;)

Re: chomp() problems
by BrowserUk (Patriarch) on Sep 24, 2002 at 17:32 UTC

    Not quite sure why chomp isn't working, but your tr// isn't very wise as you are replacing the \r\n with 2 null (ascii 0) bytes which could bite (chomp:) back later.

    I think tr/\r\n//; would probably be better.

    If you posted the relevant bit of your code, it might be easier to see why chomp isn't doing what you expect.


    Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!
      Personally, if it's not working but you have a work around, I'd go with the work around, although I would agree with the above comment and not use the NULLS.
      TMTOWTDI
      _____________________________________________________
      mojobozo
      word (wrd)
      interj. Slang. Used to express approval or an affirmative response to
      something. Sometimes used with up. Source
        OTOH, this could be a symptom of a larger problem which could surface again later, but might not be as easy to find. I suggest figuring out what it going on. Besides, having a better understanding of the code can't hurt, right? :-)

        -disciple
Re: chomp() problems
by charnos (Friar) on Sep 24, 2002 at 17:52 UTC
    $line =~ tr/\r\n//d; should do what BrowserUK suggested..IIRC, the /d is required to delete characters. Checking the value of $/ would help ascertain why chomp() isn't working the way you expected.

    Also, $line =~ s/\r\n$//; would probably more accurately replicate chomp()'s functionality.

    Update: Thanks to bart, who reminded me that the $INPUT_RECORD_SEPARATOR is $/, not $\. There still appear to be quite a few incorrect vars floating around this thread.
      I just ran into this recently with files being FTP'd from Windows to Linux, and I have to agree with Flexx: \r and \n are system-specific, where this is a byte-specific problem. What I used was similar to charnos' substitution, but fixed to ASCII:
      $line =~ s/\xD|\xA//g;

      This removes all stray CR/LF from any ASCII to any ASCII. (Sorry, no EBCDIC support.) As long as Perl can figure out where the line breaks are, this will get rid of the odd bits.

      --
      Spring: Forces, Coiled Again!

        Yup. Some like it hex... ;)

        Update: Actually, what you wrote is EBCDIC compatible. It'll substitute DOS CRLF's on any system, be it an ASCII or an EBCDIC one. You're using discrete ordinals (hexadecimal ones, in your case) instead of the infamous logical symbols \r\n, and that's what makes your statement portable.

        So long,
        Flexx

        PS: Someone downvoted this node (it's -1 by the time of this writing). Why? What did I do wrong here?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://200426]
Approved by SparkeyG
Front-paged by rbc
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2022-11-27 13:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?