Re: Split tab-separated file into separate files, based on column name
by Tux (Canon) on Aug 26, 2020 at 12:33 UTC
|
OK, I'll bite. A one-liner it is:
$ cat test.tsv
id name position
1 Nick boss
2 George CEO
3 Christina CTO
$ perl -MText::CSV_XS=csv -E'my$aoh=csv(in=>"test.tsv",bom=>1,sep=>"\t
+");' \
-E'for$h(keys%{$aoh->[0]}){say$h;open$fh,">","$h.txt";say$fh $_
+ for$h,map{$_->{$h}}@$aoh}'
id
position
name
$ cat id.txt
id
1
2
3
$ cat name.txt
name
Nick
George
Christina
$ position
boss
CEO
CTO
update: added a -E to split the line for readability
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] |
Re: Split tab-separated file into separate files, based on column name
by Eily (Monsignor) on Aug 26, 2020 at 12:13 UTC
|
If you want the very bare functionnality a oneliner might work, but you'll need to switch to a longer script for pretty much any kind of control you may want to have over the result:
perl -lanE 'for (0..$#F) { `echo $F[$_] >> file$_` }'
You can read perlrun to understand what the options do (and change the way the file is split, because it's split on whitespace by default, not tabs). It will fail if the input is not simple enough (if there are quotes, dashes, or semi colons in the data). And you'll start to get extra output data if you call it several times in a row.
All that being said, you asked for the clever way. The clever way is to keep the solution that you understand, if you ever have to fix it.
Edit s/perlun/perlrun/. Thanks AnomalousMonk | [reply] [d/l] |
|
Kudos.
I didn't think it's possible and the trick is to shell out the writing and opening to a shorter syntax.
This might be considered dirty in a real Perl script but should be acceptable in a one-liner.
And interestingly it should also work on windows.
Point is Perl has no mean to print_and_open_if_necessary()
So the next step is to ask myself if the semantics could be cleanly replicated in Perl...
IMHO a tied hash %FH would be most elegant
print $FH{">>$name"} $value
I didn't try to search CPAN for similar solutions yet, cause I'm not sure how.
Comments welcome. ..
| [reply] [d/l] [select] |
|
#!/usr/bin/awk -f
BEGIN { FS = "\t" }
FNR == 1 {
split("", Fields) # clear fields array
for (i = 1; i <= NF; i++) Fields[i] = $i
next
}
{ for (i = 1; i <= NF; i++) print $i > Fields[i] }
Save it in a file and mark it executable; tested with GNU Awk. Feed it input on stdin or list the files you want it to read on the command line.
If you want to add prefixes or suffixes to the output file names, add them to the print statement, like so: print $i > ("out."Fields[i]".txt"); the parentheses ensure that the invisible concatenation operator will be parsed correctly. | [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
This might be considered dirty in a real Perl script but should be acceptable in a one-liner. 100% agree with that sentence (which says a lot, since the sentence is "this might be").
You could use operator overloading to replicate that feature. "Value" > file("path"); or "Value" >> file("path") where file returns an object that overloads > and >>
Or you could do something closer to C++:
fstream("path") << 120 << " in hexadecimal is " << ctrl::hex << 120;
fstream("logs", "a") << ctrl::autoline << "I'm adding this line to the
+ logs" << "and also this line";
| [reply] [d/l] [select] |
|
|
|
|
|
|
Re: Split tab-separated file into separate files, based on column name
by tybalt89 (Monsignor) on Aug 26, 2020 at 13:29 UTC
|
#!/usr/bin/perl
use strict; #https://perlmonks.org/?node_id=11121090
use warnings;
my @handles = map { open my $fh, '>', "tmp.$_" or die; $fh }
split /\t|\n/, <DATA>;
while( <DATA> )
{
my @data = split /\t|\n/;
print { $handles[$_] } $data[$_], "\n" for 0 .. $#handles;
}
close $_ or die for @handles;
__DATA__
id name position
1 Nick boss
2 George CEO
3 Christina CTO
| [reply] [d/l] |
Re: Split tab-separated file into separate files, based on column name
by LanX (Sage) on Aug 26, 2020 at 11:36 UTC
|
> so there must be a more clever way :)
You want a one liner and I doubt it'll be very readable.
The clever way is to split the head line and to open files for each entry and to hold the filehandles in an array.
Now you can print each field by column position after splitting the remaining lines.
That's a dozen code lines at most. ..
| [reply] |
Re: Split tab-separated file into separate files, based on column name
by Corion (Patriarch) on Aug 26, 2020 at 11:52 UTC
|
| [reply] |
|
Will "part" split the file vertically? Because, in my example, the desired output would be:
* FILE "id" with values
1
2
3
* File "name" with values
Nick
George
Christina
* File "position" with values
boss
CTO
CEO
| [reply] [d/l] |
|
| [reply] |
Re: Split tab-separated file into separate files, based on column name
by LanX (Sage) on Aug 27, 2020 at 16:19 UTC
|
Here a pure Perl one-liner,
please note that
- the files are named after the column heads
- that I use Windows quoting rules
D:\tmp>del id,name,position
D:\tmp>perl -lanE "if (@FH) {print $_ shift @F for @FH} else {open $FH
+[$x++], '>', $_ for @F}" data.txt
D:\tmp>type data.txt, id,name,position
data.txt
id name position
1 Nick boss
2 George CEO
3 Christina CTO
id
1
2
3
name
Nick
George
Christina
position
boss
CEO
CTO
UPDATE
eliminated bug
| [reply] [d/l] |
|
D:\tmp>type 1,George,CEO
1
George
CEO
D:\tmp>
strange behavior... (Update: see solution
here )
I didn't expect this, but Perl seems to silently refuse to re-open an already open file handle
so if you don't mind having the column head included you can go even shorter
D:\tmp>del id,name,position
D:\tmp>perl -lanE "open $FH[$x++], '>', $_ for @F;print $_ shift @F f
+or @FH" data.txt
D:\tmp>type id,name,position
id
id
1
2
3
name
name
Nick
George
Christina
position
position
boss
CEO
CTO
D:\tmp>
| [reply] [d/l] [select] |
|
| [reply] [d/l] [select] |
|
|
|