http://qs321.pair.com?node_id=865794

elef has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks, I'm asking you a favour in troubleshooting a perl script on a mac.

The part that's broken is the following: the script tells the user to drag and drop the two input files into the console window, it parses the resulting strings to find the folder, filename and extension, checks that the files indeed exist and moves on to open the files and do its thing.
This works for me in Windows XP and 7, but in OS X Leopard, it fails to find the files even though the filepath seems to be parsed correctly. I don't have a mac so I can't troubleshoot it myself. Perhaps it's some problem with the encoding the filepath itself is in or the if (-e "file") command I used is wrong in some mysterious way... to be honest, I have no idea what's going on.

Backstory: The program is a text aligner that creates parallel corpora out of texts and their translations.
It is an open source project that will end up on sourceforge in Windows, linux and mac flavours (I'm using PAR::Packer to package the script and the modules it relies on into an executable). The script is broadly the same for all three platforms, with if/else branching to account for platform-specific issues.
The windows executable is done and it works fine, but I ran into problems with getting it to work on a mac. As I don't have a mac, I asked a friend to test it. He tells me the script says it can't find the files, but that's about as far as we got with troubleshooting. I would appreciate any help.

A short test script to check basic functionality:
#!/usr/bin/perl use strict; use warnings; use utf8; my $inputfile; print "\nDrag and drop a file here:\n"; chomp ($inputfile = <STDIN>); $inputfile =~ s/^ *[\"\']?([^\"\']*)[\"\']?/$1/; print "\nFilepath with quotes and spaces stripped: >$inputfile<\n"; print "\n--------------------------------------------\n"; print "\nTest 1: no parsing, just checking if file is found:\n"; if (-e "$inputfile") {print "\nOK, file found\n";} else {print "\nERRO +R: file not found\n";} print "\n--------------------------------------------\n\nTest 2, check +ing if file can be opened:\n\n"; open(FILE, "<", "$inputfile") or die "Can't open file: $!"; print "OK, file opened successfully. \n\nPress enter to quit\n"; <STDIN>;

Could some kind monk run this and let me know if one or both tests pass on a mac?

The actual aligner script is almost 2K lines and it needs a couple of other bits and pieces to run, so I uploaded the full package to mediafire. It's my first perl project and I'm not a programmer... Let me know if you notice something hideous that should be improved. Everything in there is tested and working on Windows, though.
To test on mac OS X, just start LF_aligner_10_12.pl and drag and drop the two pdf files from testfiles. You will get some feedback in the console and a log will also be created in aligner/scripts. The error message it threw in testing is "ERROR! File 1 not found"
If you want to have a look at the code itself, the relevant bit starts at "# DRAG & DROP FILES (t, h, p)" around line 400.
The code looks about like this (this is just a simplified, cleaned-up sample that won't actually run; to run the script, please get the mediafire package.)
# DRAG & DROP FILES do { print "\n\n-------------------------------------------------"; print "\n\nDrag and drop file 1 here and press enter.\n"; chomp ($file1_full = <STDIN>); # windows doesn't add quotes if there is no space in the path, + linux adds single quotes # strip any leading and trailing spaces and quotes; $1=everyth +ing up to last / or \, $2= everything from there up to the end except + spaces and "'. $file1_full =~ /^ *[\"\']?(.*)[\/\\]([^\"\']*)[\"\']? *$/; $folder = $1; $file1 = $2; $file1 =~ /(.*)\.(.*)/; $f1 = $1; $ext = lc($2); print "\nDrag and drop file 2 here and press enter. (This file + has to be in the same folder as file 1!)\n"; chomp ($file2_full = <STDIN>); $file2_full =~ /^ *[\"\']?(.*)[\/\\]([^\"\']*)[\"\']? *$/; $folder2 = $1; $file2 = $2; $file2 =~ /(.*)\.(.*)/; $f2 = $1; $ext2 = lc($2); print LOG "\nInput files dropped in: $file1 (${file1_full}), $ +file2 (${file2_full})"; unless ("$folder" eq "$folder2") { print "\n\n\nERROR! The two files are not in same folder. Try +again!\n($folder vs ${folder2})\n"; print LOG "\nERROR: The two files are not in same folder. $fol +der, $folder2"; } unless ("$file1_full" ne "$file2_full") { print "\n\n\nERROR! You dragged in the same file twice. Try ag +ain!\n"; print LOG "\nERROR: Same file dropped in twice"; } unless ("$ext" eq "$ext2") { print "\n\n\nERROR! The file extensions don't match. Try again +!\n($ext vs. $ext2)\n"; print LOG "\nERROR: Extensions don't match: $ext vs. $ext2"; } unless (-e "$folder/$file1") { print "\n\n\nERROR! File 1 not found (maybe its path or its fi +lename contains accented letters). Try again!\n(file: $folder/$file1) +\n"; print LOG "\nERROR: File 1 not found; folder: $folder, file: $ +file1"; } unless (-e "$folder2/$file2") { print "\n\n\nERROR! File 2 not found (maybe its path or its fi +lename contains accented letters). Try again!\n(file: $folder2/$file2 +)\n"; print LOG "\nERROR: File 2 not found; folder: $folder2, file: +$file2"; } if ($ext eq "doc") { print "\n\n\nERROR! Doc files are not supported. Convert to do +cx or txt and try again!\n"; print LOG "\nERROR: doc file dropped in"; } $alignfilename = "${f1}-${f2}"; close LOG; open (LOG, ">>:encoding(UTF-8)", "$scriptpath/scripts/log.txt" +) or print "\nCan't create log file: $!\nContinuing anyway.\n"; } until (("$folder" eq "$folder2") && ("$file1_full" ne "$file2_fu +ll") && ("$ext" eq "$ext2") && (-e "$folder/$file1") && (-e "$folder/ +$file2") && ($ext ne "doc"));

Replies are listed 'Best First'.
Re: OS X troubleshooting help needed - parse filename & open file
by Your Mother (Archbishop) on Oct 17, 2010 at 16:14 UTC

    I've been hacking Perl on OS X since the Public Beta and there are no differences between it and other *nixes for file handling (except the default filesystem is case-insensitive).

    File and path handling has a lot of caveats. You should be using File::Spec, as suggested by roboticus, or Path::Class::(File|Dir). Where OS differences do exist those packages will handle it transparently.

      I don't see how using File::Spec would improve this code, which is essentially what seems to be broken - although I won't know for sure until somebody is kind enough to test it on a mac:
      my $inputfile; print "\nDrag and drop a file here:\n"; chomp ($inputfile = <STDIN>); $inputfile =~ s/^ *[\"\']?([^\"\']*)[\"\']?/$1/; # strip whitespace an +d quotes if (-e "$inputfile") {print "\nOK, file found\n";} else {print "\nERRO +R: file not found\n";}

        You're quite right. It doesn't work as is on OS X 10.5. Here's something you might adjust to see the reason it's failing-

        print -e $inputfile ? "OK, file found\n" : qq{ERROR: file "$inputfile" not found\n};
        Drag and drop a file here: /Users/cow/Downloads/104937653.pdf ERROR: file "/Users/cow/Downloads/104937653.pdf " not found

        You can see the extra space that dragging and dropping in the Terminal is adding. I expect this is a quirk of the OS's GUI, I don't have anything but OS X at home to try this on.

        So, you're right, in this case using one of the modules would not have helped. A trim routine such as s/\s\z//; will fix this problem (though the file may have leading or trailing spaces so you'd have to be careful). The reason it took awhile and so many messages to get to this answer is your original post didn't distill the issue down to something easy to try or read. If this one had been your first post, you would have got your answer (from me or someone else) immediately.

        You should still consider using a file handling package. They will ultimately save you a lot of time and catch edge cases that might matter to end users even if you think them irrelevant or karma inducing.

        Note that if your filename contains single or double quotes, you'll strip them and subsequently Perl won't find the file under the mangled name. A somewhat saner approach might be to find out how exactly filenames arrive in your program when dragged into it.

        As an example, on Windows, filenames dropped into the console window get surrounded by double quotes. So, on Windows it would make sense to strip one leading and the trailing double quote. Windows allows single quotes in filenames, so stripping them out is ill-advised.

Re: OS X troubleshooting help needed - parse filename & open file
by pobocks (Chaplain) on Oct 17, 2010 at 14:41 UTC

    I'm downloading it to test it right now - you should know, however, that I installed a case-sensitive filesystem, so my results may not be the norm.

    I'm on OSX 10.6.4

    for(split(" ","tsuJ rehtonA lreP rekcaH")){print reverse . " "}print "\b.\n";
      Thank you.
      As a first step, could you run the test script posted here?
      If that fails, we know there is something fundamentally wrong with what I'm trying to do.
      If the tests in the test script pass but the main script in the mediafire package still can't find the files, we'll have to start looking for some other error source.
      I think I have if ($ext eq "pdf") {...} somewhere in the script and that's why I lowercased the extension - I thought the case didn't matter anyway. This would be trivially easy to fix, and I don't think it could cause a problem here. If you test it with files that have a lower-case extension to begin with, then it's a non-issue, and in any case the test condition is (-e "$folder/$file1"), not (-e "$folder/${f1}.$ext") - so it shouldn't matter. The the string stored in $file1 isn't lower-cased.
        Hi elef, Just ran your test script and the results are below:
        Drag and drop a file here: /Users/macuser/Desktop/untitled.txt Filepath with quotes and spaces stripped: >/Users/macuser/Desktop/unti +tled.txt < -------------------------------------------- Test 1: no parsing, just checking if file is found: ERROR: file not found -------------------------------------------- Test 2, checking if file can be opened: Can't open file: No such file or directory at /private/var/folders/ce/ +ceU3CVgiEbO3AiWqVsJpqU+++TI/Cleanup At Startup/untitled text-30903276 +1.079 line 20, <STDIN> line 1. logout [Process completed]
        "...the adversities born of well-placed thoughts should be considered mercies rather than misfortunes." — Don Quixote
      As you can see, this got cleared up in the meantime. To make absolutely sure that this is indeed the issue, could you run the script in the zip package anyway?
      Just delete any leading and trailing spaces manually after dropping in the files and see if you get an error message.
      Thank you.

      Edit: this was of course addressed to pobocks who said he downloaded the package to test it. The peculiar perlmonks answer system placed it here.

        Done and works.

        for(split(" ","tsuJ rehtonA lreP rekcaH")){print reverse . " "}print "\b.\n";
Re: OS X troubleshooting help needed - parse filename & open file
by roboticus (Chancellor) on Oct 17, 2010 at 13:16 UTC

    LF:

    I notice that you're using lc to lowercase the extensions. IIRC, a mac has (or can be configured to use) a case sensitive file system...so "foo.PDF" and "foo.pdf" are two different filenames. That might be (part of?) your problem.

    I don't know if you're familiar with File::Spec or not, but it might do the trick for you. It provides a portable method to handle file names and paths. I've not used it much myself, so I can't give it an unqualified endorsement. But it may be worth your while to investigate it.

    ...roboticus

      Good point, duly noted.
      In this case, I don't think this could be the issue as I the test was done on files with a lower-case extension.

      To be honest, even though I'm sure File::Spec is in principle a better solution than what I'm doing, I'm reluctant to learn how File::Spec works, rewrite the script and then start testing/troubleshooting on all platforms all over again. What I have works on Windows and Linux, I'd be more inclined to just fix the current OS X problems and call it a day... if the problems are easily fixable, that is.

        elef:

        Ah, well, then I'm fresh out of ideas. I have a buddy with a Mac, though, so I'll send him an EMail and see if he might give it a try for you. (No promises, though, as he's a Java/Ruby coder and frequently avoids computers outside of his day job...)

        I also understand what you mean about going back and reworking a bunch of code to insert a module. I'd go for a simpler fix, too, if I find one. But it's handy to know about things like that for future projects, or when your back is against the wall.

        ...roboticus

        Update: Added '/Ruby' and second paragraph.

        elef:

        He ran it and replies:

        tony-******-macbook-pro:~ tony******$ perl --version This is perl, v5.8.8 built for darwin-thread-multi-2level (with 4 registered patches, see perl -V for more detail) <<snip>> tony-******-macbook-pro:~ tony******$ perl m.perl Name "main::FILE" used only once: possible typo at m.perl line 27. Drag and drop a file here: /Users/tony******/Desktop/x.txt Invalid type 'W' in unpack at m.perl line 12, <STDIN> line 1. Now, regarding your concern about extra bytes... plausible: Note the space after x.txt... the square is probably my cursor.

        Notes: (1) I anonymized his post by replacing his last name with asterisks, and (2) He included a screenshot that shows a blank after x.txt, which he alludes to in the last line. So his results agrees with the testing that you and louis.roca performed.

        ...roboticus

Re: OS X troubleshooting help needed - parse filename & open file
by aquarium (Curate) on Oct 18, 2010 at 00:15 UTC
    Such is the beast that for multi-platform programs, you end up creating the app as a lowest common denominator, with plenty of overhead code to prevent it falling over on variants of the operating systems. There are many variants of Unix, Linux, and Windows, and some variants file systems work differently, causing problems. I join the chorus in highly recommending a File::Spec or similar cross-platform filehandling solution, so you don't end up with workarounds for OS variants throughout your code. Then as long as you use the module properly, your app will run on many OS variants, current or future.
    Mind you in OSX you could instead create an ActionScript action instead, or your app could be made into a service, which could make things nicer, but unfortunatelly not portable to other OSes.
    Maybe (or maybe not) you'd be better off having a gui/web interface instead of drag'n'drop to console. Using a console in the first place assumes a certain level of user knowledge, and you might just be better off to get them to specify path instead of introducing the drag'n'drop to console thing into it.
    if you want your project to succeed, you will need to think about your application audience closely, to avoid disappointment. whether you're a programmer/software-engineer or not.
    all the best in your project. you might possibly be interested in a freshmeat project called "Sally", so i mention it.
    the hardest line to type correctly is: stty erase ^H
      Sally does look like something in the same general area as text alignment, but I'm not the one to tell how useful it could be... that stuff goes right over my head. My aligner has hunalign do the heavy lifting, I just built a frontend for it with pre- and postprocessing and a bunch of other user-friendly features.

      Maybe (or maybe not) you'd be better off having a gui/web interface instead of drag'n'drop to console. Using a console in the first place assumes a certain level of user knowledge, and you might just be better off to get them to specify path instead of introducing the drag'n'drop to console thing into it.

      Well, a gui would be nicer and less scary for the users, but it would also be a hell of a lot of work. I'm considering making a gui with tk or something similar for this later on. BTW this started as a windows batch script, which I rewrote in perl to make it cross-platform and expand the functionality. I'm definitely not going back to a single-platform solution unless it's for Windows and it allows me to easily write a more powerful program with a nice gui... probably not even then.
      Getting the user to specify a path... that's what I wanted to get away from. That requires the user to have quite a bit of computer knowledge and even then it's a royal pain to do. Most savvy people would end up moving their files to where the aligner is, and most unsavvy people would end up not being able to use the thing at all. IMO drag & drop is the best solution considering that a file browsing gui is out of the question for now.