Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Using system (); with Strawberry Perl

by hadrons (Novice)
on Nov 24, 2021 at 23:50 UTC ( [id://11139091]=perlquestion: print w/replies, xml ) Need Help??

hadrons has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm a newbie here and this is my first question to the group, so forgive any lapse in etiquette. I wrote this script that is heavy in the use of system() and I run it using cygwin and it runs fine in that tool- however, when I use Strawberry Perl to run it there are a number of commands within the system() that it doesn't recognize. For example:
system ('grep -l "DATAmessage.*3\.0" *.xml > 3.0_files_arraydata.txt') +; system ("mv temp_3.0_files_onixarraydata.txt 3.0_files_arraydata.txt") +; system ("cat *files_arraydata.txt > data2.txt"); system ("rm data2.txt"); system ("sort -u data2.txt > data.txt");
It appears that the commands grep, mv, cat and rm all fail because "grep|mv|cat|rm is not recognized as an internal or external command, operation program or batch file" I also use sort, but I receive no error message for that command.

I have tried other things to swap out the system calls like File::Grep in place of the grep calls, but while File::Grep works, it was slower than molasses in the dead of winter in Strawberry Perl. I know many look-down on the use of system ();, but I find it to be very fast. Any suggestions? And thank you for reading.

Replies are listed 'Best First'.
Re: Using system (); with Strawberry Perl
by swl (Parson) on Nov 25, 2021 at 00:09 UTC

    Most of your utilities are from unix but are not available on windows. Cygwin provides these as it is a unixy platform and hence they are available when you run perl under cygwin. When you use Strawberry perl it is under windows and thus any calls via command will be sent to the windows cmd shell, which then does not know about the utilities you are calling, or your syntax.

    You could look at installing the Perl Power Tools which will give access to some of the utilities, but you still need to modify your syntax so it will work under windows.

    In the end it will probably be simpler to translate it all into perl code anyway, as the examples you show can all be done in perl pretty easily, and quickly.

Re: Using system (); with Strawberry Perl
by eyepopslikeamosquito (Archbishop) on Nov 25, 2021 at 00:14 UTC

    Sorry, but calling grep and mv and cat in this style from a driving Perl script via system (without even checking for errors) is just too weird for me. :) I've seen people use a shell script driver to call Perl scripts, but not the reverse.

    Anyways, if this came up in a code review at work, you would be told to write the whole thing in Perl. See Unix shell versus Perl for why.

      Actually that's similar to how I "migrated" from Bash to Perl back then ...

      Cruel code! ;-)

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

      There was one "perl script" at $job several years back which was basically a bash script with every line wrapped in backticks . . . *shudder*

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

        It's awful, but it does work :-)

      For me, a mixture of Bash and Perl sometimes makes sense when writing a quick proof of concept. If the main part of the code works out, i can always go to CPAN, find a module that fits my need, read the docs and then implement it.

      But for a first try, it's often quicker to just calling wget than it is to install and use LWP::UserAgent. And for a PoC, it really doesn't matter if bash calls wget and then the perl script, or if the perl script calls wget when needed. It's just throw-away code anyway.

      Case in point: Perl+PostgreSQL+GeoIP = Awesome! was my proof of concept, which then turned into GeoIP revisited. The proof of concept was ugly and was using external (command line) tools. But it quickly showed me what worked and what didn't. And it was "good enough" to post on PM to get some feedback from which i learned that i was using a deprecated GeoIP database version. Which then lead to a "Pure Perl" version that was nicer, but also more time intensive to implement.

      perl -e 'use Crypt::Digest::SHA256 qw[sha256_hex]; print substr(sha256_hex("the Answer To Life, The Universe And Everything"), 6, 2), "\n";'

        There is a vast gulf between your proof of concept bash script and the OP's code! I see your bash script, for example, calling:

        rm GeoIPCountryCSV.zip
        I do not see your Perl script calling:
        system ("rm GeoIPCountryCSV.zip");

        Look at the OP's code again. Have you ever seen anything like it? I haven't, which is why I asked for the backstory behind it. Sadly, it looks like we'll never know because the OP seems to have vanished (presumably forever) from the Perl Monks universe. This is a tragedy because learning the backstory would make the description of the why behind this code much more interesting when it's deservedly viewed by a wider audience.

        BTW, given your interest in using a proof of concept, you might be interested in a three-part series I wrote a few years ago: Building the Right Thing (Part I): Pretotyping

        While I understand your sentiment and have done similar myself there is an inherent need to realise that doing such is pure hackery. In this particular case hadrons is shelling out to perform things like rename, unlink and so on which is arguably more work (because of shell escapes, etc) than just running the Perl built-ins. It is also highly relevant that it is this practice of shelling out alone which has given hadrons such problems that they have needed to come here seeking advice in the first place.

        But for a first try, it's often quicker to just calling wget than it is to install and use LWP::UserAgent.

        Can't say I agree. If LWP isn't already installed you may as well do so now because you are going to need it sooner or later. It's also pretty simple to use. However, if even that is too much trouble, then consider HTTP::Tiny which is in core so you already have it.


        🦛

      I'm aware its poor programming, but the results work

        Except that they don't - otherwise you wouldn't be here asking for help to fix it. ;-)


        🦛

        I admire your honesty. Do you have any interest in writing solid code?

Re: Using system (); with Strawberry Perl
by kcott (Archbishop) on Nov 25, 2021 at 05:54 UTC

    G'day hadrons,

    Welcome to the Monastery.

    As already pointed out, there are incompatibilities between Unix commands, MS commands and Perl functions; e.g. rm (Unix), erase (MS), unlink (Perl).

    If speed is all-important, you may have to write separate name.sh and name.bat scripts.

    Perl can do everything in your examples without system. For instance, take a look at open which has examples of reading from and writing to files; glob which can expand wild cards like *.xml; and the core File::Copy module which has a move() function.

    Your description of "File::Grep" is not the way to go here (even if it does have some basis in fact). You've provided no code, no context, and no Benchmark! Instead, you should provide those things and ask us how you might improve speed efficiency. I've never used File::Grep so I just looked it up: it's 16 years old; a better option may have been produced since it was written.

    — Ken

      I did get better results from File::Grep when I changed from fgrep to just grep
Re: Using system (); with Strawberry Perl
by jwkrahn (Abbot) on Nov 25, 2021 at 07:11 UTC

    Hello hadrons, I haven't used Windows in over 20 years but these may work.

    (Obviously I can't test these because ... no Windows)

    system ('grep -l "DATAmessage.*3\.0" *.xml > 3.0_files_arraydata.txt') +;
    open my $FH, '>', '3.0_files_arraydata.txt' or die "Cannot open '3.0_files_arraydata.txt' because: $!"; { local @ARGV = <*.xml> or die "Cannot find any '*.xml' files\n"; while ( <> ) { if ( /DATAmessage.*3\.0/ ) { print $FH "$ARGV\n"; close ARGV; } } }

    system ("mv temp_3.0_files_onixarraydata.txt 3.0_files_arraydata.txt") +;
    rename 'temp_3.0_files_onixarraydata.txt', '3.0_files_arraydata.txt' or die "Cannot move 'temp_3.0_files_onixarraydata.txt' because: $! +";

    system ("cat *files_arraydata.txt > data2.txt");
    open my $FH, '>', 'data2.txt' or die "Cannot open 'data2.txt' because: $!"; { local @ARGV = <*files_arraydata.txt> or die "Cannot find any '*files_arraydata.txt' files\n"; print $FH $_ while <>; }

    system ("rm data2.txt");
    unlink 'data2.txt' or die "Cannot delete 'data2.txt' because: $!";

    system ("sort -u data2.txt > data.txt");
    open my $IN, '<', 'data2.txt' or die "Cannot open 'data2.txt' because: $!"; open my $OUT, '>', 'data.txt' or die "Cannot open 'data.txt' because: $!"; { my %unique; print $OUT sort grep { ! $unique{ $_ }++ } <$IN>; }
      Every line worked - thank you so much. How do I indicate that this is the answer?

        That's not the way Perl Monks works, as indicated by the quote below.

        Most languages are like stackoverflow: I have a question, I want the best answer. Perl is like PerlMonks: I have a doubt, I want to read an interesting discussion about it that is likely to go on a tangent. q-:

        -- tye in Re: What is PerlMonks? (why Perl)

Re: Using system (); with Strawberry Perl
by soonix (Canon) on Nov 25, 2021 at 08:17 UTC
    Actually, what astonishes me:
    • First, you create a file "3.0_files_arraydata.txt", and in the next line you clobber it by moving another file over it
    • then, you create "data2.txt", remove it, and only afterwards try to sort it...
    😲

      Given the original post says:

      ... there are a number of commands within the system() that it doesn't recognize. For example: ...
      it seems the entire script was not posted, just some example commands that were not found.

      I'd like to view the entire script and learn the backstory as to how it came to be written in this unusual style.

Re: Using system (); with Strawberry Perl
by Marshall (Canon) on Nov 25, 2021 at 10:09 UTC
    jwkrahn has the right idea. Instead of using Perl to call an O/S specific function, write Perl code that runs on multiple platforms: Windows, Cygwin or Unix.

    Perhaps some code like this code for your first system call with grep?
    I did not create the test files necessary to actually prove that this works on my Windows system, but this is plausible.

    #!/usr/bin/perl use strict; use warnings; use autodie; #system ('grep -l "DATAmessage.*3\.0" *.xml > 3.0_files_arraydata.txt' +); open my $OUT, '>', '3.0_files_arraydata.txt'; foreach my $filename (<*.xml>) { open my $in, '<', $filename; print $OUT "$filename\n" if grep{/DATAmessage.*3\.0/}<$in>; }
    The above code will be slower than Unix grep -l because this code looks at every line of the input file and reports the number of lines that matched, and if >0, that fact causes the filename to be printed. grep -l stops at the first matching line and reports the file name. Speed depends upon how big your files are. A couple more lines of Perl code can emulate the exact grep -l functionality (stop reading the file when the first match is found). I have no idea what File::Grep is and why you would need it. Perl regex is very fast. Literally decades of tweaking have gone into the regex engine.

    Anyway, try the above out and see how it goes.

      A couple more lines of Perl code can emulate the exact grep -l functionality (stop reading the file when the first match is found).

      If you want the speed that grep -l would give you then have a look at List::Util::any instead. It has an XS version and is in Core.


      🦛

      File::Grep was slow because I used fgrep instead of just grep ... I'll test out your code later in the day when I have enough test files
        This code print $OUT "$filename\n" if grep{/DATAmessage.*3\.0/}<$in>; is slow because it continues to read the file even after it has found the first occurrence of the regex match (it actually counts the number of occurrences in the file). To use hippo's idea: add use List::Util qw(any); at the top. And change code to print $OUT "$filename\n" if any{/DATAmessage.*3\.0/}<$in>;.

        the "any" routine is written in C. The Perl equivalent is like this:

        while (<$in>) { if (/DATAmessage.*3\.0/) { print $OUT "$filename\n"; last; #no need to look anymore! } }
        If whatever you are looking for usually appears near the beginning of the file, performance gain will be substantial.

        update:
        Another place to use a List::Util function:

        { my %unique; print $OUT sort grep { ! $unique{ $_ }++ } <$IN>; } ##### again use List::Util to speed up Perl implementation... #### use List::Util qw(any uniq); print $OUT sort uniq <$IN>;
        I suppose that depending upon the data, it could be that reversing the order, i.e., sorting and then filtering out uniq lines would be faster? Don't know. But if speed is needed, I would also benchmark that approach. Also, instead of building a hash table, try: "print line unless its a repeat of previous line". Results probably depend upon what typical data actually looks like. For example:
        my $prev = ""; foreach (sort <$IN>) { print unless $_ eq $prev; $prev = $_; }
Re: Using system (); with Strawberry Perl
by bliako (Monsignor) on Nov 25, 2021 at 12:29 UTC

    A quick hack if you have cygwin installed in your computer: run each shell command (grep, mv, etc.) through cygwin's shell, like system("c:/cygwin/bin/bash.exe -c 'rm data2.txt'"); Of course you may be able to call mv.exe, rm.exe by using their fullpath, with something like this: system("c:/cygwin/bin/rm.exe data2.txt");. I am not familiar with cygwin to know if that's possible. If the above works then you can create your own system() to prepend the fullpath to any system command.

    sub mysystem { my ($cmd, $args) = @_; my $finalcmd = 'c:/cygwin/bin/'.$cmd.'.exe'.' '.$args; # or ... 'c:/cygwin/bin/bash.exe -c "'.$cmd.' '.$args.'"'; return system($finalcmd); } mysystem("ls", "-al data2.txt > abc")==0 or die "system command failed +";

    Thankfully I do not have any windows computer available within the borders of my estate and so all the above are untested. Therefore it is best to avoid use of rm during testing!

    EDIT: check also a more elegant rewrite of the above: https://stackoverflow.com/a/55010946 . And also for the bit below, this is an informative post: The problem of "the" default shell

    BUT! the advice of other Monks is much more sound than using the hack above: rewrite everything in Perl. Actually jwkrahn++ has done the work for you! Apropos File::Grep being slow, show us the codez.

    bw, bliako

      The issues I had with File::Grep involve using fgrep. When I changed to just grep it worked with greater performance. I had thought that fgrep was faster for some reason.
      And jwkrahn++ did a great job
Re: Using system (); with Strawberry Perl -- UnxUtils and gnuwin32
by Discipulus (Canon) on Nov 26, 2021 at 08:37 UTC
    Hello hadrons and welcome to the monastery and to the wonderful world of perl!

    I'm a bit late but I add my own solution: put these programs in your PATH infact you can use gnuwin32 or UnxUtils as I've done since years.

    They are really useful and make the poor cmd.exe experience a bit easier to survive to. That said a plain perl solution is by far better in terms of portability. In your PATH order matters:

    C> path PATH= C:\EX_D\ulisseDUE\perl5.26.64bit\perl\site\bin; # strawberry perl +portable C:\EX_D\ulisseDUE\perl5.26.64bit\perl\bin; # strawberry perl +portable C:\EX_D\ulisseDUE\perl5.26.64bit\c\bin; # strawberry perl +portable C:\EX_D\ulisseDUE\bin\UnxUtils\usr\local\wbin; # <--------------- +-------- UnxUtils C:\WINDOWS; # OS C:\WINDOWS\system32; # OS C> ls C:\EX_D\ulisseDUE\bin\UnxUtils\usr\local\wbin | grep -E "grep|mv +|cat|rm" agrep.exe cat.exe egrep.exe fgrep.exe grep.exe mv.exe mvdir.exe rm.exe rman.exe rmdir.exe zcat.exe

    Just one big caveat: be sure to have the right make (or dmake, gmake..) in front in your path:

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11139091]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-03-28 23:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found