http://qs321.pair.com?node_id=11127142

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a many large documents with tabs in them. They are not all the same length. I want to do something like this: tr/\t/\s\s\s\s/ ## replace \t with 4 spaces There is a thing called Text::Tabs -- but that appears to work only for tables. My goal is to do: @text = split /\s/, $_; @text[0] = $string join /\s/, @text Any ideas?

Replies are listed 'Best First'.
Re: How to replace \t with \s
by Tux (Canon) on Jan 20, 2021 at 14:17 UTC
    EXPAND(1) User Commands + EXPAND(1) NAME expand - convert tabs to spaces SYNOPSIS expand [OPTION]... [FILE]... DESCRIPTION Convert tabs in each FILE to spaces, writing to standard output +. With no FILE, or when FILE is -, read standard input. Mandatory arguments to long options are mandatory for short opt +ions too. -i, --initial do not convert tabs after non blanks -t, --tabs=N have tabs N characters apart, not 8 -t, --tabs=LIST use comma separated list of tab positions The last +specified position can be prefixed with '/' to specify a tab size + to use after the last explicitly specified tab stop. Also a +prefix of '+' can be used to align remaining tab stops relative to + the last specified tab stop instead of the first column --help display this help and exit --version output version information and exit

    Enjoy, Have FUN! H.Merijn

      In Windows 8 expand is a decompression utility. I was able to find the one you reference in the bash shell provided by Git for Windows.

Re: How to replace \t with \s
by Don Coyote (Hermit) on Jan 20, 2021 at 15:56 UTC

    tr replaces a list of characters with another list of characters, so without modifiers this will not do as you think, it will only replace \t with the first incident of the replacement list.

    my $word = q{thing}; $word =~ tr/i/oooo/; __END__ thong

    Use the substitution operator s///. This will replace a match with a replacement string.

    my $word = q{Perl is a thing I like to do.}; $word =~ s/i/oooo/; __END__ Perl oooos a thing I like to do.

    Or globally on the line.

    $word =~ s/i/oooo/g; __END__ Perl oooos a thoooong I looooke to do.

    Splitting on a repeated character singularly will produce multiple empty lists, that once replaced will essentially reconstruct the line as it was processed. Which is fine if that is what you want.

    my @text = split /o/, $word; print join qq{o},@text; __END__ Perl oooos a thoooong I looooke to do.

    This can be seen more easily joining with a linefeed character\n.

    my @text = split /o/, $word; print join qq{/n},@text; __END__ Perl s a th ng I l ke t d .

    Also note, join uses a literal string as a first argument. So you are not able to use a match escape for general whitespace to replace a literal character escape for a particular whitespace character.

    \t is a tab whitespace escape, that interpolates in double-quoted context.

    \s is a matching character escape that matches against any whitespace literal.

    \s would not interpolate inside double-quoted context as how would perl know to which whitespace literal you wanted to replace with?

    Whereas split also has the option to use the match m// operator in place of a literal string as its first argument. Moreover, literal ' ' space string to split alters splits behaviour in regard to white space parsing.

    Your goal is not clear about if that is your attempt to solve the problem or if that is something you are preparing the document for so that you can then perform. As such, depending on what it is you want to do you may be able to make use of a capturing split, or splitting directly on the tabs instead of spaces on the first pass.

    my $line = q{Perl is a thing I like to do.}; print split /i/, $line; print split /(i)/, $line; __END__ Perl s a thng I lke to do. Perl is a thing I like to do.
    print join q{o}, split /i/, $line; print join q{o}, split /(i)/, $line; __END__ Perl os a thong I loke to do. Perl oios a thoiong I loioke to do.

    And splitting on only incidents of only two or more consecutive characters.

    my $line = q{Perl is a thiiing I like to do.}; print join q{ooo}, split /i{2,}/, $line; __END__ Perl is a thooong I like to do.

    This also has the advantage of not actually mangling your documents in place. There could be loose tab characters anywhere in all those files, so set up some small test lines to ensure the operator behaves a expected, and make backups. As per usual.

    edit: Clarified splits usage of string or match operator as first argument, and that \s does not interpolate.


    Dooon Coooyoooteee

Re: How to replace \t with \s
by hippo (Bishop) on Jan 20, 2021 at 14:31 UTC
    I want to do something like this: tr/\t/\s\s\s\s/ ## replace \t with 4 spaces
    $ hexdump -C foo 00000000 61 09 62 62 62 09 63 63 63 63 63 63 09 64 0a |a.bbb.ccc +ccc.d.| 0000000f $ perl -ple '$_ = join " ", split /\t+/' foo > bar $ hexdump -C bar 00000000 61 20 20 20 20 62 62 62 20 20 20 20 63 63 63 63 |a bbb + cccc| 00000010 63 63 20 20 20 20 64 0a |cc d.| 00000018 $

    Edit: restricted now just to replacing tabs as this is presumably what was originally meant.


    🦛

Re: How to replace \t with \s
by Lotus1 (Vicar) on Jan 27, 2021 at 23:54 UTC

    Text::Tabs shows in the example given that it works on lines. It also works on arrays.

    #!perl # unexpand -a use Text::Tabs; while (<>) { print unexpand $_; }
Re: How to replace \t with \s
by betmatt (Scribe) on Jan 20, 2021 at 14:32 UTC
    Please tell us in English exactly what you want to do. First describe the nature of the file. Then describe what you want to do to the data in that file. Then describe the nature of the output that you want. Then give us an example of the code that you have tried to do this routine. Comment, with the code, what each line of the code does. Make that code as simple as possible and don't be afraid to break that code up onto many lines so that each line is as simple as possible. I'll then take a look at it.