Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

How to substitute all tabs only in a specific field

by xuo (Acolyte)
on May 25, 2020 at 13:33 UTC ( [id://11117231]=perlquestion: print w/replies, xml ) Need Help??

xuo has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm asking again for your help.
I know the title seems quite obvious but it is something I was not able to do.
Here is my input file :
a b « x1 x2 »
c d « x2 »
e f « x3 x4 x5 »

And I want it to become :
a b « x1,x2 »
c d « x2 »
e f « x3,x4,x5 »

In other words, I want to substitute all <tabs/space> in the field that starts and ends with <double quotes> by <commas>.
The number of fields between quotes can be of any value (practically it will never be greater than 6 or 7 but I'd like something generic).
And I want to do this (if possible) on a unix command line ie :
perl -pe 's/.../g' < input_file > output_file.

Do you know how to achieve this ?

Regards.

Xuo.
  • Comment on How to substitute all tabs only in a specific field

Replies are listed 'Best First'.
Re: How to substitute all tabs only in a specific field
by hippo (Bishop) on May 25, 2020 at 13:43 UTC
    In other words, I want to substitute all <tabs/space> in the field that starts and ends with <double quotes> by <commas>.

    That isn't what you've shown in your desired output where there are still whitespaces in that field. Since we don't know what you want, it's difficult to explain the method of getting there.

Re: How to substitute all tabs only in a specific field
by haukex (Archbishop) on May 25, 2020 at 14:29 UTC
    In other words, I want to substitute all <tabs/space> in the field that starts and ends with <double quotes> by <commas>.

    This sounds like something you might want to do with Text::CSV, but if your input file format is always as simple as you showed, it's possible with Regexp::Common as well.

    And I want to do this (if possible) on a unix command line ie : perl -pe 's/.../g' < input_file > output_file.

    If you must...

    $ cat in.txt a b « x1 x2 » c d « x2 » e f « x3 x4 x5 » g h « x1 \« x2 \» x3 » $ perl -wMstrict -MRegexp::Common=delimited -CSD -ple \ 's{$RE{delimited}{-keep}{-delim=>qq{\N{U+AB}}}{-cdelim =>qq{\N{U+BB}}}}{ my ($x,$y,$z)=($2,$3,$4); $y=~s/(?<!^)\s+(?!$)/,/g; $x.$y.$z }eg' in.txt a b « x1,x2 » c d « x2 » e f « x3,x4,x5 » g h « x1,\«,x2,\»,x3 »
Re: How to substitute all tabs only in a specific field
by Fletch (Bishop) on May 25, 2020 at 13:41 UTC

    Presuming you don't need to handle escapes (e.g. the closing quote character never will appear inside) you (probably) can get away with something simple along these lines (writing as a script for clarity; reducing it to perl -lnE left as an exercise):

    use utf8; while( <> ) { chomp; my( $leader, $fields ) = m{\A (.*?) « \s* (.*?) \s* » \z}x; my @items = split( /\s+/, $fields ); say qq{$leader « }, join( q{,}, @items ), q{ »}; }

    Edit: Tweaked the output, as you do have a couple spaces still in your output as hippo points out. One presumes you mean to canonicalise to pad with a single space around the joined items, but . . .

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: How to substitute all tabs only in a specific field
by kcott (Archbishop) on May 26, 2020 at 02:39 UTC

    G'day Xuo,

    There are a lot a problems with your post:

    • title has "tabs" but description has "<tabs/space>"
    • data has guillemets but description has "<double quotes>"
    • you're not substituting whitespace just after the opening quote and just before the closing quote (++hippo already commented on this)
    • you've failed to use <code> tags so we can't see the number of whitespace characters: HTML converts strings of spaces and tabs into a single space

    Please read "How do I post a question effectively?" carefully.

    So, with a lot of guessswork, I believe the following technique, succinctly and efficiently, does what you want:

    $ perl -pE 's/^[^"]+"\s+(.+)\s+"/$1 =~ y{ \t}{,}rs/e' a b " x1 x2 " x1,x2 c d " x3 " x3 e f " x4 x5 x6 x7 " x4,x5,x6,x7 x y z " 1space 2spaces spacetabspace end " 1space,2spaces,spacetabspace,end

    The 'r' modifier was introduced in Perl v5.14: see "perl5140delta: Non-destructive substitution". Attempting to modify $1 directly results in a "Modification of a read-only value" fatal error. If you have a earlier version of Perl, you'll need to use an interim variable; perhaps something like this:

    $ perl -pe 's/^[^"]+"\s+(.+)\s+"/($x = $1) =~ y{ \t}{,}s; $x/e' a b " x1 x2 " x1,x2 c d " x3 " x3 e f " x4 x5 x6 x7 " x4,x5,x6,x7 x y z " 1space 2spaces spacetabspace end " 1space,2spaces,spacetabspace,end

    — Ken

Re: How to substitute all tabs only in a specific field
by xuo (Acolyte) on May 26, 2020 at 07:39 UTC
    Hi,

    Thank you for your answers.
    You are right, my example was not clear enough. I should have been more careful about it.
    Here it is, written in a better way (I hope) :
    a b "x1 x2" c d "x2" e f "x3 x4 x5"
    And I want it to become :
    a b "x1,x2" c d "x2" e f "x3,x4,x5"
    All "spaces" are blanks or tabs. Only tabs in the fields between quotes should be replaced by commas.
    All blanks should be kept as blanks (inside or outside the quotes).
    I was not able to add "tabs" in the example to get a more generic one but for the moment I can live with "blanks" only.

    I couldn't make it work from the code from Haukex because I don't have Regexp::Common, nor the one from Fletch.
    The one from Kcott "almost" worked. The double-quote fields are correct, the ones before the quotes are missing :).
    I continue working on it.

    Regards.

    Xuo.

      Sorry, but your specifications are still unclear.

      Only tabs in the fields between quotes should be replaced by commas.

      But your example input doesn't have any tabs. You should copy and paste an actual sample of an input file into the <code> tags.

      You also haven't specified whether the quotes can be escaped or not, i.e. whether a b "x1    \"    x2" is valid and should result in a b "x1,\",x2".

      I couldn't make it work from the code from Haukex because I don't have Regexp::Common

      Yes, even you can use CPAN, and also in the worst case the regular expressions generated from Regexp::Common can be printed on a machine that has it installed, and then used on one that doesn't.

      Depending on your actual input file format, maybe your solution can be as simple as:

      $ cat in.txt a b "x1 x2" c d "x2" e f "x3 x4 x5" $ perl -ple 's/(?<=")([^"]*)(?=")/(my$x=$1)=~s#\t#,#g;$x/ge' in.txt a b "x1,x2" c d "x2" e f "x3,x4,x5" $ perl -ple 's/(?<=")([^"]*)(?=")/$1=~s#\t#,#gr/ge' in.txt a b "x1,x2" c d "x2" e f "x3,x4,x5"

      The second example only works on Perl 5.14+ due to the /r modifier.

      "The one from Kcott "almost" worked. The double-quote fields are correct, the ones before the quotes are missing :)."

      Yes, I forgot to capture the first part of the strings. Fixing that, and then making changes for your altered input and updated spec:

      $ perl -pE 's/^([^"]+")([^"]+)/$1 . $2 =~ y{\t}{,}r/e' a b "x1 x2" a b "x1,x2" c d "x2" c d "x2" e f "x3 x4 x5" e f "x3,x4,x5"
      "I was not able to add "tabs" in the example ..."

      Surely you must mean something else. I pressed the key labelled "TAB" on my keyboard to add them to my input. Admittedly, it may not be easy to see the difference between a tab and a space, but the output gives it away:

      $ perl -pE 's/^([^"]+")([^"]+)/$1 . $2 =~ y{\t}{,}r/e' a b "x1 x2" a b "x1 x2"

      — Ken

Re: How to substitute all tabs only in a specific field
by xuo (Acolyte) on May 26, 2020 at 11:33 UTC
    Hi,

    I could make it work with :
    perl -pe 's/^([^"]+)"(.+)"/($x = $2) =~ y{\t}{,}s;"$1\"$x\""/e'
    Here is another try for the input file :
    a b c d "x1 x2" c d e "x2" e f h "x3 x4 x5 x6"
    The expected result :
    a b c d "x1,x2" c d e "x2" e f h "x3,x4 x5,x6"
    Thanks again to all of you for your useful help.
    Regards.
    Xuo.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11117231]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-24 06:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found