Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Split tab-separated file into separate files, based on column name

by Eily (Monsignor)
on Aug 26, 2020 at 12:13 UTC ( [id://11121096]=note: print w/replies, xml ) Need Help??


in reply to Split tab-separated file into separate files, based on column name

If you want the very bare functionnality a oneliner might work, but you'll need to switch to a longer script for pretty much any kind of control you may want to have over the result:

perl -lanE 'for (0..$#F) { `echo $F[$_] >> file$_` }'
You can read perlrun to understand what the options do (and change the way the file is split, because it's split on whitespace by default, not tabs). It will fail if the input is not simple enough (if there are quotes, dashes, or semi colons in the data). And you'll start to get extra output data if you call it several times in a row.

All that being said, you asked for the clever way. The clever way is to keep the solution that you understand, if you ever have to fix it.

Edit s/perlun/perlrun/. Thanks AnomalousMonk

Replies are listed 'Best First'.
Re^2: Split tab-separated file into separate files, based on column name (open on demand)
by LanX (Saint) on Aug 26, 2020 at 14:50 UTC
    Kudos.

    I didn't think it's possible and the trick is to shell out the writing and opening to a shorter syntax.

    This might be considered dirty in a real Perl script but should be acceptable in a one-liner. And interestingly it should also work on windows.

    Point is Perl has no mean to print_and_open_if_necessary()

    So the next step is to ask myself if the semantics could be cleanly replicated in Perl...

    IMHO a tied hash %FH would be most elegant

    print $FH{">>$name"} $value

    I didn't try to search CPAN for similar solutions yet, cause I'm not sure how.

    Comments welcome. ..

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Point is Perl has no mean to print_and_open_if_ne­cessary()

      Sometimes Perl is not the best tool for the job. Awk does have that feature and here is an Awk program that does what our questioner asks:

      #!/usr/bin/awk -f BEGIN { FS = "\t" } FNR == 1 { split("", Fields) # clear fields array for (i = 1; i <= NF; i++) Fields[i] = $i next } { for (i = 1; i <= NF; i++) print $i > Fields[i] }

      Save it in a file and mark it executable; tested with GNU Awk. Feed it input on stdin or list the files you want it to read on the command line.

      If you want to add prefixes or suffixes to the output file names, add them to the print statement, like so: print $i > ("out."Fields[i]".txt"); the parentheses ensure that the invisible concatenation operator will be parsed correctly.

        Since this is currently the top node of the past 24 hours, I'll comment.

        Sometimes Perl is not the best tool for the job. Awk ...

        I strongly disagree. Perl is a replacement for awk and sed and can do everything they can, and much, much more. tobyink pointed out IO::All - and while this module may not be in the core, note that CPAN is one of Perl's greatest strengths.

        If you're familiar enough with awk to whip up this script that's fine, and it's certainly interesting to see how it's done in other languages (though this isn't AwkMonks), but consider that the OP may already not be very familiar with Perl, and throwing yet another new language into the mix is unlikely to be the most efficient approach in the long run.

        Hi jcb,

        excellent post, thank you! I did write a Perl script after all, but I suspect that your way is much faster!
        Thanks to all that offered their advice, much appreciated :)
        > Sometimes Perl is not the best tool for the job

        Well the OP asked for a one liner but you provided now a script.

        I have trouble to see why a Perl script may be worse than an Awk script. (?)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

      This might be considered dirty in a real Perl script but should be acceptable in a one-liner.
      100% agree with that sentence (which says a lot, since the sentence is "this might be").

      You could use operator overloading to replicate that feature. "Value" > file("path"); or "Value" >> file("path") where file returns an object that overloads > and >>

      Or you could do something closer to C++:

      fstream("path") << 120 << " in hexadecimal is " << ctrl::hex << 120; fstream("logs", "a") << ctrl::autoline << "I'm adding this line to the + logs" << "and also this line";

        So it got me curious, and I did a quick-and-dirty test implementation of scalar > file() and fstream() << scalar. But I get the "useless use of ... in void context" warnings.

        So my tangential question: Is there a way to "export" the no warnings 'void' from inside the streaming package, rather than requiring it in ::main? It would be best if it could just turn off the warnings for the streaming objects, but leave the warnings on for non-overloaded uses of comparison and bitshift. I tried putting the no-warnings inside the overloaded functions, to try to keep the scope limited, but that's not the right place to prevent the warning. (Yes, I understand this isn't necessarily good practice, or "nice" to the external user. This is just for my own curiosity, and not something I'd put in practical code.)

        #!/usr/bin/perl -l use warnings; use strict; # [id://11121105] # as suggested, have file() return an object which overloads > and >>, + so you can overwrite and append # "value" > file('to_overwrite'); "value" >> file('to_append'); # or have fstream() return an object which overloads just <<, for a ve +ry c+_+-eseque # fstream("path") << 120 << " in hexadecimal is " << ctrl::hex << 12 +0; END { print "="x10 }; { package fout; use autodie; use overload '>' => \&overwrite, '""' => sub { \${$_[0]} }, ; sub file($) { print __PACKAGE__, "::file($_[0])"; return bless \$_[0], __PACKAGE__; } sub overwrite { # 'value' > file($f) ==> overwrite(\$f, 'value', 1) my ($self, $value, $swap) = @_; if($swap) { open my $fh, '>', $$self; print {$fh} $value; } else { warn "not sure how to clobber scalar: `file($$self) > $val +ue`"; } return $self; } } *file = \&fout::file; #no warnings 'void'; #file('zzz') > "value"; #fout::overwrite(file('xxx'), 'manual', 1); # this wrote the file corr +ectly #'blah > file(yyy)' > file('yyy'); # will this? #print __PACKAGE__, "::"; { package fstream; use autodie; use overload '<<' => \&append, ; sub fstream($) { print __PACKAGE__, "::fstream($_[0])"; return bless \$_[0], __PACKAGE__; } sub fclobber($) { print __PACKAGE__, "::clobber($_[0])"; open my $fh, '>', $_[0]; return bless \$_[0], __PACKAGE__; } sub append { my ($self, $value, $swap) = @_; if(!$swap) { open my $fh, '>>', $$self; print {$fh} $value; } else { warn "not sure how to clobber scalar: `$value << file($$se +lf)`"; } return $self; } } *fstream = \&fstream::fstream; *fclobber = \&fstream::fclobber; 'value' << fstream('szsz'); fstream('sss') << "first"; fstream('sss') << "second" << "third"; fclobber('clb') << 'one' << 'and another';
        > You could use operator overloading

        I don't think it's a good idea to overload two very different operators like > "greater-than" and >> "shift".

        That's begging for inconsistency problems. (like syntax, precedence, name it ...)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11121096]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2024-03-28 22:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found