http://qs321.pair.com?node_id=586942

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

This probably isn't a Perl-specific question, but it might be.

I was terribly amused, when doing some research on one of our servers, to see the following line in the command history:

  ln -s /usr/bin/perl /usr/bin/perl\r

Of course, that didn't work, and apparently successive attempts failed until eventually, the following was used:

  perl -e'symlink($_ => "$_\r") for @ARGV' /usr/bin/perl /usr/bin/python /usr/bin/ruby

This was done because these are dedicated servers and customers sometime FTP programs written on a Windows box to their Linux box and the symlink was considered easier than writing a custom FTP server or running a cron job to find the errant files :)

Can anyone think of a better solution which doesn't require manual intervention after setting up a dedicated server for a customer who doesn't know about line endings or dos2unix?

Cheers,
Ovid

New address of my CGI Course.

Replies are listed 'Best First'.
Re: (OT) Fixing Line Endings
by Corion (Patriarch) on Nov 30, 2006 at 14:37 UTC

    This is another situation where the -w command line switch beats use warnings; ;-)

    #!/usr/bin/perl -w\r ...

    just works ™

Re: (OT) Fixing Line Endings
by derby (Abbot) on Nov 30, 2006 at 14:43 UTC

    Some ftp servers (purftpd) allows you to hook in scripts that are kicked off after upload.

    -derby

      I like Derby's idea. If I'm understanding it correctly, you dos2unix every script after it's uploaded, as needed. That is, all scripts on the server will always use standard unix line endings, period.

      In addition, while you're at it, you could also even have your automated tool send an email to the guilty Windowsy uploader, courteously asking that they please either use text mode for scripts with their ftp client next time, or else to please manually convert line endings before uploading.

      Though, I see that this doesn't answer to the Ovid's OP, since he was looking for something that "doesn't require manual intervention".

        Well, it works in that it's something which could be automated and then, theoretically, doesn't require extra work on our part once it's installed. However, as one coworker has pointed out, we'd be altering customer files. If they FTP something to our server they could see the MD5 checksum fail, notice the file size is different, FTP it back to their box and get failures on their end, and so on. I think a hook which merely sends a warning message to the customer might be a better way to go.

        Cheers,
        Ovid

        New address of my CGI Course.

Re: (OT) Fixing Line Endings
by davorg (Chancellor) on Nov 30, 2006 at 14:38 UTC

    Point out that the problem goes away if they use a -w or -T flag on the shebang line :-)

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: (OT) Fixing Line Endings
by rhesa (Vicar) on Nov 30, 2006 at 14:38 UTC
    Haven't they considered telling their customers to transmit their files as text files instead of binary? Pretty much every ftp client I know of can do this. IMHO, properly uploading files falls in the same category of requirements as a correct shebang line.

    Creative solution, though, I'll give you that :)

      We have thousands of customers. Every time you give a customer one more thing they have to remember means more calls to the support center when they forget what we tell them.

      Cheers,
      Ovid

      New address of my CGI Course.

        Sure, sure. Everything you can do to avoid support calls is valuable. That doesn't mean you shouldn't avoid the opportunity to educate your customers: I would have first opted for a customized HTTP_INTERNAL_SERVER_ERROR.html (or whatever your 500 template is), including links to an FAQ entry detailing how to upload programs.
Re: (OT) Fixing Line Endings
by Melly (Chaplain) on Nov 30, 2006 at 14:44 UTC

    Could someone explain to an idiot (e.g. me) how this solves line-ending problems...

    Tom Melly, pm@tomandlu.co.uk

      A Perl program that is created on Windows will have Windows line end characters (i.e. CR/LR or \015\012). If you upload that to a Unix server without translating the line endings then your shebang line will look like this:

      #!/usr/bin/perl\r\n

      The \n is the line end character for Unix, so that's not a problem. But some Unix shells will look for a program called /usr/bin/perl\r and, usually, won't find it.

      As a couple of us have suggested, adding an option to the shebang line solves the problem as the shebang line will then look like this:

      #!/usr/bin/perl -w\r\n

      The shell parses the command name (/usr/bin/perl) out of that and passes the rest of it (-w\r) to the Perl interpreter as options. Perl is cleverer than the shell and will handle both kinds of line endings.

      --
      <http://dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg

        actually she-bang is a kernel thing ...so you have to fix the broken kernel ...which I would not call un*x

        --stephan

        A Perl program that is created on Windows will have Windows line end characters (i.e. CR/LR or \015\012).

        Although, just for informational purposes, it's important to note that you don't have to create files with the \r\n ending. I've been creating all of my files with \n endings for years on Windows, and it's saved me a lot of grief, particularly when ftp-ing files without worrying about setting text mode. The text editors that I've been using are jedit and vim; I'm sure there are others that can do this.

      The #!...\n line at the beginning of an executable file on a Unix system is recognised as telling the kernel that this is a script, to be passed to the interepreter named in the ... part.

      If the line is '#!/usr/bin/perl\r\n' (as it would be on a Windows box), the kernel will attempt to find a file called '/usr/bin/perl\r'.

      It's a hilarious (but real) solution.

        Ah! Tnx. Hmm... couldn't you create a perl script /usr/bin/perl\\r and have that handle the problem?

        Tom Melly, pm@tomandlu.co.uk
Re: (OT) Fixing Line Endings
by jbert (Priest) on Nov 30, 2006 at 15:09 UTC
    If the only way to get scripts onto the box is ftp, then you could look for a "post-upload-hook" feature in your ftpd, or hack one in if it's open source. Then you could try some heuristics (does the file look like text, does it have a file extension suggesting it is a script, etc) and do the CRLF->LF conversion if the heuristics recommend it.

    Actually - why not a kernel hack? Just map "foo\r" -> "foo" in the exec codepath. You'll lose the ability to run scripts with an interpreter of such a name, but that seems a small loss.

    <rummages> There's some string manipulation in linux/fs/binfmt_script.c to find and launch the shebang. Just add a:

    if(interp(strlen(interp)-1) == '\r') { interp(strlen(interp)-1) = '\0'; }
    just after the strcpy (interp, i_name); in the load_script function and Bob's your Auntie's live-in lover. Possibly.
Re: (OT) Fixing Line Endings
by swampyankee (Parson) on Nov 30, 2006 at 16:34 UTC

    I'm surprised that you've got this problem when FTPing programs; my experience has been that ftp usually starts up in ASCII mode (and I usually forget to put it into binary mode, resulting in me downloading a corrupted binary, smacking my head against the monitor, turning binary mode on, and repeating the download). I have seen it when Windows and *ix boxen are networked together; apparently some networking systems are bright enough to manage big-endian - little-endian conversions but not quite intelligent enough to do end-of-line correction.

    Ovid's solution is clever. Hacking ftpd may be a "superior" solution, but Ovid would be doing this on somebody else's nickel, and those somebodies may not want to pay for it.

    emc

    At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation.

    —Igor Sikorsky, reported in AOPA Pilot magazine February 2003.
      > ftp usually starts up in ASCII mode

      Perhaps the files were in a .tar or .zip?

      Would there be a similar problem when moving Perl files en masse from *nix to Windows?

        >> ftp usually starts up in ASCII mode

        As I said, my experience has been that ftp usually starts up in ASCII mode. I believe that some ftp daemons can be configured to default to binary mode. Since most of the traffic is likely to be binary, doing so would make a good deal of sense.

        >> Perhaps the files were in a .tar or a .zip?

        From the way Ovid's original post was worded, it didn't seem so.

        >> Would there be a similar problem when moving Perl files en masse from *nix to Windows?

        Well, I know that text files can lose their end-of-record markers, turning a multi-line text file into a very long single line. I don't know whether Perl's parser will recognize \n vs \r\n as an end of line marker. It may, which could result in a perfectly valid Perl program that looks like a very long one-liner.

        emc

        At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation.

        —Igor Sikorsky, reported in AOPA Pilot magazine February 2003.
Re: (OT) Fixing Line Endings
by djp (Hermit) on Dec 01, 2006 at 01:15 UTC
    So far, four solutions have been offered:

    1. Patch the kernel
    2. Educate the customers
    3. Modify the customer scripts
    4. Symlink /usr/bin/perl\r -> /usr/bin/perl etc.

    The first three are undesirable or impossible for a variety of reasons, as stated by various monks. In the absence of a better alternative, the symlink solution, while indeed 'terribly amusing', gets my vote - it's actually quite clever.

Re: (OT) Fixing Line Endings
by serf (Chaplain) on Dec 01, 2006 at 09:40 UTC
    Hi Ovid,

    Good question! :o)

    I agree with your response to rhesa's suggestion too - when the kind of user who is going to send a file in binary mode is in a stressed state trying to figure out why the Perl isn't where it's meant to be (because that's what they'll be thinking when they see the error) often the last thing they're going to think about is the fact that they may or may not have been told that they should transfer their files in ASCII mode - which might not have meant anything to them at the time they were told it.

    You either need to stop these problems from happening, or alternatively have something there and then to tell them the cause of their problem so they can quickly fix it themselves.

    Not teaching you to suck eggs, the rest of this post is for anyone else reading this looking for answers to the same kind of problem:

    For generating the link, from the command line in bash the original person could have just done this:

    ln -s /usr/bin/perl "/usr/bin/perl$(echo -e "\r")"

    Or in 'set -o vi' mode in bash or Korn shell they could have just typed this:

    ln -s /usr/bin/perl /usr/bin/perl^V^M
    (For those that don't know, ^V^M is typed [Ctrl+V][Ctrl+M])

    Another solution I used when I used to host shell accounts for my friends on my Internet server (to help them learn how to use Linux) and was getting fed up with having to regularly look at their scripts for them to tell them why they would not run... was to compile a little C program which just generated a warning message telling them what was wrong, and install that as /bin/bash^M on my system so that the user would understand the problem and fix it themselves.

    This is on my old machine:
    $ ls /bin/bash* | cat -v /bin/bash /bin/bash^M $ file /bin/bash^M : ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Li +nux 2.0.30, dynamically linked (uses shared libs), for GNU/Linux 2.0. +30, stripped $ /bin/bash^M Your shell script has DOS linefeeds in it!

    You could do the same thing with /usr/bin/perl^M

    If my users were the kind who wouldn't know what that meant I would probably add some extra hints, like:
    Your Perl script has DOS linefeeds in it! If you uploaded it to the server via FTP please upload it again in ASC +II mode.
    For users with shell access I would probably add a message about dos2ux or saving it after doing:
    set ff=unix
    in vim or doing:
    :%s/^V^M//g
    and saving it...

    But I expect the majority of people running Perl scripts that have been sent from a CRLF machine probably have FTP/web access only.

    If anybody reading this is thinking about just copying /usr/bin/perl to /usr/bin/perl^M for any reason - don't do it, use the symlink instead.

    If you just copy the file, when you upgrade perl you will be left with an orphaned copy of the old version unless you (and whoever comes after you in supporting the machine) always remembers to upgrade that file too = not clever.

Re: (OT) Fixing Line Endings - patch the shell
by imp (Priest) on Nov 30, 2006 at 15:50 UTC
    This thread contains a variety of clever ways to trick the shell into executing the right program.. why not just patch the shell's source to be \r aware, and submit the patch to the maintainers? Solve the problem, instead of finding clever workarounds.
      because that could possibly break every other program in the shell. It doesnt has to but the possibility exist and besides that unix just uses \n as line ending is a well known convetnion. you dont change that because some people transfer their files incorrectly ;)

      --
      "WHAT CAN THE HARVEST HOPE FOR IF NOT THE CARE OF THE REAPER MAN"
      -- Terry Pratchett, "Reaper Man"

        I haven't looked at the source to see what would be involved, but my uneducated guess would be that changing the shebang handler to ignore \r wouldn't be very complicated or risky.

        We all know that the proper line ending on a unix system is \n , but exploding because someone uploaded a file with a \r at the end of the shebang line isn't really the best approach in my opinion.

      Because it's not the shell. It's the kernel, really. (I think this has been on PM recently...ah, here we are).

      I suggested (and outlined) a patch for the kernel above. There might be a more clever way of doing it with a loadable kernel module to avoid rebuilding your kernel, but that would probably be quite a bit more work (but allow you to use your vendor kernel).

        Yes, there's no good excuse for the kernel failing to ignore what is clearly whitespace. Someone please fix Linux already. This bug has existed for way too long.

        ( And how come it seems that there aren't any Linux users that know that she-bang lines are handled by the kernel? I think that change even predates Linux so there never was a Linux where the shell had to do the #!-handling. How many decades does it take for people to catch on? :)

        - tye        

Re: (OT) Fixing Line Endings
by helgi (Hermit) on Dec 01, 2006 at 09:09 UTC
    I've come across this problem or variants of it, literally hundreds of times.

    Most recently, here at work we were trying to get a packaged java solution to work on a Linux box. The shell script containing the java command wouldn't work, giving us "library not found" errors, but the command pasted to the command line, worked just fine.

    I'm ashamed to say that it took me a while to realise that the \r was to blame since I'd only encountered it before in a perl context. dos2unix solved the problem of course.

    In a previous job we used to run a cron job that dos2unix'ed everything in certain directories and if changes were made, e-mailed the owner/programmer that his script had been fixed.


    --
    Regards,
    Helgi Briem
    hbriem AT f-prot DOT com
        Yup. That's how I ran into you.


        --
        Regards,
        Helgi Briem
        hbriem AT f-prot DOT com
Re: (OT) Fixing Line Endings
by bsdz (Friar) on Nov 30, 2006 at 23:20 UTC
Re: (OT) Fixing Line Endings
by graff (Chancellor) on Dec 01, 2006 at 03:15 UTC
    I'm wondering if there are any implications (unpleasant side effects) for the sysadmin (or any other users, for that matter) that might arise from having files in /usr/bin or /usr/local/bin whose names happen to end with a carriage-return character (even though these are just symlinks -- or I suppose they could be hard links -- to some other file with a normal unix file name).

    For instance, if someone just does "ls /usr/bin" in a normal shell window, there's a good chance they will see something that does not reflect the directory's real contents, because at least one line might have an unexpected "\r" in a non-rightmost column of the listing. This could be very confusing, disorienting, possibly even frightening, if the user doesn't know about or consider the possibility that some file names contain "\r".

    If your sysadmins and other users are willing to cope with that sort of anomaly, then there's really no problem -- unix file names can encompass any sort of bizarre content (except null bytes), without really causing any serious trouble, provided that users are aware of the situation and of the potential risks of not handling it properly.

      Don't leave the slash behind, as it likes to go to netherland of null bytes.