http://qs321.pair.com?node_id=11118732

nysus has asked for the wisdom of the Perl Monks concerning the following question:

I want to extract the value from a string that (I assume) could look a mix of any of the following formats:

--rsync-path = 'blah blah' # might be spaces before/af +ter equal sign --rsync-path=/usr/bin/rsync # no quotes around value (a +ssuming this is allowed by rsync) --rsync-path="blah blah \"blah" # double quotes, with possi +ble escaped quotes --rsync-path='blah blah \'blah' # single quotes, with possi +ble escaped quotes --rsync-path='blah blah' --another-option # additional options might +follow --another-option --rsync-path='blah blah' # additional options might +precede (and follow) # any other tricky alternatives I'm forgetting?

So basically, I want to simulate how bash extracts the value but with perl.

I could probably create some regexes for this but I'm quite sure some obscure scenario will be left out not to mention the possibility of badly malformed user input. So is there any module out there that might make extracting this value more of a no-brainer?

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

Replies are listed 'Best First'.
Re: Perl: Extracting the value from --rsync-path=PROGRAM key/value pair
by davido (Cardinal) on Jun 30, 2020 at 19:05 UTC

    Probably best to leave the command-line parsing to the pros. Getopt::Long can handle almost all of these formats except for the first one, which I don't think is actually legal. But we can make it work:

    #!/usr/bin/env perl use strict; use warnings; use Getopt::Long qw(GetOptionsFromString :config gnu_compat :config pa +ss_through); foreach my $string ( q{--rsync-path = 'blah blah'}, # might be spaces b +efore/after equal sign q{--rsync-path=/usr/bin/rsync}, # no quotes around +value (assuming this allowed by rsync) q{--rsync-path="blah blah \"blah"}, # double quotes, wi +th possible escaped quotes q{--rsync-path='blah blah \'blah'}, # single quotes, wi +th possible escaped quotes q{--rsync-path='blah blah' --another-option}, # additional option +s might follow q{--another-option --rsync-path='blah blah'}, # additional option +s precede ) { my $tidy_string = $string =~ s/(--rsync-path)\s*=\s*/$1=/r; # This + cleans up the whitespace around = in the first example. my $rsync_path; my ($ret, $args) = GetOptionsFromString( $tidy_string, 'rsync-path=s' => \$rsync_path, ); printf "%-48s: rsync_path => %-32s\n", "($string)" => "($rsync_pat +h)"; }

    The output:

    (--rsync-path = 'blah blah') : rsync_path => (blah +blah) (--rsync-path=/usr/bin/rsync) : rsync_path => (/usr/ +bin/rsync) (--rsync-path="blah blah \"blah") : rsync_path => (blah +blah "blah) (--rsync-path='blah blah \'blah') : rsync_path => (blah +blah \'blah) (--rsync-path='blah blah' --another-option) : rsync_path => (blah +blah) (--another-option --rsync-path='blah blah') : rsync_path => (blah +blah)

    I think that's correct handling for each of your examples.

    The GetOptionsFromString subroutine was used, but if you're just parsing directly from the command line, GetOptions would have been adequate. The gnu_compat config option provides better handling for =, and the pass_through returns the un-handled options without noisy warnings.


    Dave

      Awesome, thanks. I was wondering if Getopt could be useful in this situation. I would have had no idea about those options you used, though. Very nice.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

        I would have had no idea about those options ...

        I think it's called Reading The Fine Manual. :)


        Give a man a fish:  <%-{-{-{-<

        Not a bad thought, but it adds maintenance overhead to your script. This may present portability issues if rsync has different options somewhere - or more likely, a newer version of rsync adds or deprecates an option.
Re: Perl: Extracting the value from --rsync-path=PROGRAM key/value pair
by Anonymous Monk on Jul 03, 2020 at 16:28 UTC

    rsync uses popt(3) rather than getopt(3). rsync explains why at https://github.com/WayneD/rsync/tree/master/popt

    1) popt is fully reentrant 2) popt can parse arbitrary argv[] style arrays while getopt(2) makes this quite difficult 3) popt allows users to alias command line arguments 4) popt provides convience functions for parsing strings into argv[] style arrays

    popt is a library from Redhat that's only common on Linux. If you just want to do it like rsync does, getopt is pretty close.