Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

reformatting tab delimited file

by garyboyd (Acolyte)
on Sep 05, 2011 at 14:54 UTC ( [id://924257]=perlquestion: print w/replies, xml ) Need Help??

garyboyd has asked for the wisdom of the Perl Monks concerning the following question:

Hi, could someone give me pointers/pseudocode for converting a tab-delimited file into a different format?

My data looks like this:

Name_1 TT XL_927799.1 Name_1 PA PA_392 Name_1 AT ZX_003039195.1 Name_2 TT XL_931313.1 Name_2 AT ZX_003043016.1 Name_3 TT XL_929616.1 Name_3 PA PA_5040 Name_3 PA PA_6336 Name_4 TT XL_928294.1 Name_4 PA PA_917

And I want to get it into this format:

PA TT AT PA_392 XL_927799.1 ZX_003039195.1 XL_931313.1 ZX_003043016.1 PA_5040,PA_6336 XL_929616.1 PA_917 XL_928294.1

The information in each row is separated by tabs and any field containing more than one eg Name_3 has two PA entries, will produce both entries separated by a comma.

If there is no entry a blank should be left or I suppose "No corresponding entry"

Thanks

Replies are listed 'Best First'.
Re: reformatting tab delimited file
by davido (Cardinal) on Sep 05, 2011 at 15:00 UTC

    Create a hash such as %categories. Iterate over the lines of your list. For each line, split on whitespace, then push @{$categories{$second_column}}, $third_column;

    I'm assuming you know how to open a file and read from it. A while loop will be helpful in iterating over each line. Don't forget to chomp.

    Output should just be a matter of obtaining the lists held under each hash key and printing them side by side. Another loop with some logic to print a placeholder instead of an item for a given column when one column runs out of entries while others still have entries.


    Dave

Re: reformatting tab delimited file
by Cristoforo (Curate) on Sep 05, 2011 at 23:24 UTC
    Text::Table will align your output. Here is a sample program. Also, I used Sort::Naturally so that names with trailing digits will sort correctly, i.e. when they are greater than 1 digit long.

    Update: in while loop, changed from split on space to split on tabs because thats how the fields are separated.

    #!/usr/bin/perl use strict; use warnings; use Text::Table; use Sort::Naturally; my %data; my @col2 = qw/ PA TT AT /; while (<DATA>) { chomp; my ($name, $col2, $col3) = split /\t/; push @{ $data{$name}{$col2} }, $col3; } my $tb = Text::Table->new( map {title => $_}, @col2); for my $name (nsort keys %data) { my @tmp; local $" = ','; for my $col2 (@col2) { push @tmp, $data{$name}{$col2} ? "@{ $data{$name}{$col2} }" : ""; } $tb->load(\@tmp); } print $tb; __DATA__ Name_1 TT XL_927799.1 Name_1 PA PA_392 Name_1 AT ZX_003039195.1 Name_2 TT XL_931313.1 Name_2 AT ZX_003043016.1 Name_3 TT XL_929616.1 Name_3 PA PA_5040 Name_3 PA PA_6336 Name_4 TT XL_928294.1 Name_4 PA PA_917

    This prints:

    PA TT AT PA_392 XL_927799.1 ZX_003039195.1 XL_931313.1 ZX_003043016.1 PA_5040,PA_6336 XL_929616.1 PA_917 XL_928294.1

      Thanks for everybody's suggestions, the solution provided by Cristoforo works brilliantly!

Re: reformatting tab delimited file
by Anonymous Monk on Sep 05, 2011 at 15:00 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://924257]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (8)
As of 2024-04-18 10:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found