chomp() problems

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a line that consists of a series of fields delimited by |'s ...
field1|field2|field3|field4\r\n

This file is generated as an export from MS access. Now, my problem is that I would normally (for the sake of saftey) use chomp() to remove the \r\n:

chomp($line);
chomp($line);
[download]

however this is producing NO CHANGE in the file. however, doing this works fine:

$line =~ tr/\r\n/\0\0/;
[download]

My question is ... why?

Comment on chomp() problems Select or Download Code

Replies are listed 'Best First'.
Re: chomp() problems by SparkeyG (Curate) on Sep 24, 2002 at 17:31 UTC
chomp will only remove the systems $/ ($INPUT_RECORD_SEPARATOR). In most Unix it is just \n, in DOS it's \r\n. I guess you could fix the problem by: `{ local $/ = "\r\n"; chomp ($line); }` [download] Edited to correct typo, and again edited to correct a typo and again edited to fix a type. A sick infant does wonders to your sleep habits, and therfore your typing skils ;) Note to self, do not post anything after cleaning up baby vomit. ;)	[reply] [d/l]
Re: Re: chomp() problems by bart (Canon) on Sep 25, 2002 at 00:51 UTC
In most Unix it is just \n, in DOS it's \r\n. No that's not correct. On DOS and Win32, it's also "\n". You see, the trick is that when reading from a text file, "\015\012" (AKA CRLF, "\r\n") is converted into a bare "\012" (LF, "\n"). Therefore, chomp() doesn't have to remove "\r\n", because there commonly will be, should be, only a bare "\n". And as chomp and $/ only use fixed strings not regexes for their workings, you can't have it both ways at the same time. And that, boys and girls, is why it doesn't work here. Access does store line endings as CRLF pairs. And that isn't very Perl compatible. Therefore, when reading data from Access in a Perl script, you should always turn CRLF into "\n", and vice versa when storing data back into Access.	[reply]
Re: Re: chomp() problems by fglock (Vicar) on Sep 24, 2002 at 17:56 UTC
Oops. You mean $/ (INPUT_RECORD_SEPARATOR) instead of $\ (OUTPUT_RECORD_SEPARATOR) chomp works on $/	[reply]
Re^2: chomp() problems by Aristotle (Chancellor) on Sep 24, 2002 at 17:43 UTC
`local $\ = '\r\n';` You did of course mean to use double quotes there. Makeshifts last the longest.	[reply] [d/l]
Re: Re: chomp() problems by Wally Hartshorn (Hermit) on Sep 24, 2002 at 18:36 UTC
One correction: s/in DOS it's \n\r/in DOS it's \r\n/ Sure, blame it on the baby! :-) Wally Hartshorn	[reply]
Re: chomp() problems by fruiture (Curate) on Sep 24, 2002 at 17:39 UTC
As `perldoc -f chomp` tells us, chomp <cite>removes any trailing string that corresponds to the current value of "$/"</cite>; so look at the value of $/. Because it's quite tricky how the "end of line" portability problem was solved (`man perlport`), i'd recommend you to use the explizit character escapes \015 and \012 instead of \r and \n: `{ local $/="\015\012";chomp }` [download] -- http://fruiture.de	[reply] [d/l]
Re^2: chomp() problems (use \015\012) by Flexx (Pilgrim) on Sep 24, 2002 at 23:38 UTC
++fruiture! Twice, if I could. I can't believe how fortified using `\r\n` is in some minds when dealing with foreign sytems' line terminators. In the last few days, this is the fourth post on that topic... It seems as if I stumbled into a crusade against \r\n (which, across systems, ah.., isn't). Brothers and Sisters, this is harmful! Oh well, I might overstate this, but this does spoil portability! It was Aristotle, who asked me to beat him to it, just a short while ago... ;) In this case, if you ported `local $/ = "\r\n"` to an EBCDIC-US system you'd chomp on CR followed by `chr(37)` ('#' in EBCDIC)! You need to obey the origins encoding, and for DOS that means that line endings are `\015\012` and not your systems' `\r\n`. So long, Flexx PS: SparkeyG, best wishes for your baby! `$happy_baby = not reverse(@food) and sleep($calmly) # ;)`	[reply] [d/l] [select]
Re: chomp() problems by BrowserUk (Patriarch) on Sep 24, 2002 at 17:32 UTC
Not quite sure why chomp isn't working, but your tr// isn't very wise as you are replacing the \r\n with 2 null (ascii 0) bytes which could bite (chomp:) back later. I think `tr/\r\n//;` would probably be better. If you posted the relevant bit of your code, it might be easier to see why chomp isn't doing what you expect. Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!	[reply] [d/l] [select]
Re: Re: chomp() problems by mojobozo (Monk) on Sep 24, 2002 at 17:40 UTC
Personally, if it's not working but you have a work around, I'd go with the work around, although I would agree with the above comment and not use the NULLS. TMTOWTDI _____________________________________________________ mojobozo word (wūrd) interj. Slang. Used to express approval or an affirmative response to something. Sometimes used with up. Source	[reply]
Re: Re: Re: chomp() problems by disciple (Pilgrim) on Sep 24, 2002 at 20:16 UTC
OTOH, this could be a symptom of a larger problem which could surface again later, but might not be as easy to find. I suggest figuring out what it going on. Besides, having a better understanding of the code can't hurt, right? :-) -disciple	[reply]
Re: chomp() problems by charnos (Friar) on Sep 24, 2002 at 17:52 UTC
`$line =~ tr/\r\n//d;` should do what BrowserUK suggested..IIRC, the /d is required to delete characters. Checking the value of `$/` would help ascertain why `chomp()` isn't working the way you expected. Also, `$line =~ s/\r\n$//;` would probably more accurately replicate `chomp()`'s functionality. Update: Thanks to bart, who reminded me that the $INPUT_RECORD_SEPARATOR is `$/`, not `$\`. There still appear to be quite a few incorrect vars floating around this thread.	[reply] [d/l] [select]
Re: Re: chomp() problems by paulbort (Hermit) on Sep 25, 2002 at 21:06 UTC
I just ran into this recently with files being FTP'd from Windows to Linux, and I have to agree with Flexx: \r and \n are system-specific, where this is a byte-specific problem. What I used was similar to charnos' substitution, but fixed to ASCII: `$line =~ s/\xD\|\xA//g;` [download] This removes all stray CR/LF from any ASCII to any ASCII. (Sorry, no EBCDIC support.) As long as Perl can figure out where the line breaks are, this will get rid of the odd bits. -- Spring: Forces, Coiled Again!	[reply] [d/l]
Re^3: chomp() problems by Flexx (Pilgrim) on Sep 25, 2002 at 23:08 UTC
Yup. Some like it hex... `;)` Update: Actually, what you wrote is EBCDIC compatible. It'll substitute DOS CRLF's on any system, be it an ASCII or an EBCDIC one. You're using discrete ordinals (hexadecimal ones, in your case) instead of the infamous logical symbols \r\n, and that's what makes your statement portable. So long, Flexx PS: Someone downvoted this node (it's -1 by the time of this writing). Why? What did I do wrong here?	[reply] [d/l]

Back to Seekers of Perl Wisdom