Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re: open input files once

by rnewsham (Curate)
on Jul 14, 2020 at 06:49 UTC ( #11119284=note: print w/replies, xml ) Need Help??

in reply to open input files once

If you don't want to read the files twice you need to think about your data structure and how you turn it into the desired output. Something like this should work, read the data into a temporary data structure with the file as a secondary key. Then process it to fill in the blanks at output time.

#!/usr/bin/perl use strict; use warnings; my @files = glob("file*.tab"); my %data; for my $file ( @files ) { open my $fh, '<', $file or die $!; while ( <$fh> ) { chomp; my @columns = split /\t/; die "_ERROR_ not 9 columns [ $_ ]\n" if @columns != 9; my $key = join( "\t", @columns[0..3] ); $data{$key}->{$file} = $columns[6]; } close $fh; } print join("\t", 'chr', 'fivep', 'threep', 'strand', @files), "\n"; for my $key ( keys %data ) { my ( $chr, $fivep, $threep, $strand ) = split /\t/, $key; next if ( $strand =~ /^0$/ ); my @output; for my $file ( @files ) { push @output, $data{$key}->{$file} // 0; } print join("\t", $chr, $fivep, $threep, "-", @output ), "\n"; }

I also cleaned up the code a little and added some whitespace around things as I find it makes it easier to read.

Replies are listed 'Best First'.
Re^2: open input files once
by Tux (Canon) on Jul 14, 2020 at 10:25 UTC

    Whitespace to make things easier to read is *completely* within the eyes of the beholder. Most often, I find *removing* extraneous redundant annoying empty lines improving the readability and maintainability. Also cleaning up is only useful if it matches the style of the surrounding project/script/module(s) after the cleanup. e.g. your cleaned-up code would not fit my standards and my code would not fit yours..

    And if you clean up, why not go further?

    foreach my $key (grep { !m/\t0$/ } keys %data) { my ($chr, $fivep, $threep, $strand) = split m/\t/ => $key; say join "\t" => $chr, $fivep, $threep, "-", map { $data{$key}{$_} // 0 } @files; }

    Less lines more to the point, no need for empty space as each line is clear on itself.

    Personally, I'd use Text::CSV_XS with sep => "\t" like this:

    use Text::CSV_XS qw( csv ); my @key = qw( chr fivep threep strand ); my @files = qw( ); my %c7; foreach my $file (@files) { csv (in => $file, out => undef, sep => "\t", strict => 1, on_in => sub { $c7{join ":" => @{$_[1]}[0..3]}{$file} = $_[1 +][6] } ); } say join "\t" => @key, @files; foreach my $key (sort keys %c7) { say join "\t" => (split m/:/ => $key), map { $c7{$key}{$_} // 0 } +@files; }

    Enjoy, Have FUN! H.Merijn

      Oh I agree everyone has their own preferences. My comment was mainly to point out why I changed other things as I was finding their code a little unreadable in my pre-caffine state.

Re^2: open input files once
by v15 (Sexton) on Jul 14, 2020 at 18:10 UTC

    Can you explain this line of code:

    push @output, $data{$key}->{$file} // 0

    What is // doing in the above code? Also is there a way I can get a sorted output where column 1 is sorted, then column2 and then column3?

      You should be more specific in your definition of "sorted". Look at my solution elsewhere in this thread. If you mean that the 2nd, 3rd and 4th column being sorted numeric, that isn't very hard either: use pack/unpack

      my @key = qw( chr fivep threep strand ); my @files = qw( ); my %c7; foreach my $file (@files) { csv ( in => $file, out => undef, sep => "\t", on_in => sub { $c7{pack "A10l>l>l>", @{$_[1]}[0..3]}{$file} = $_ +[1][6] } ); } say join "\t" => @key, @files; foreach my $key (sort keys %c7) { say join "\t" => (unpack "A10l>l>l>" => $key), map { $c7{$key}{$_} + // 0 } @files; }

      Enjoy, Have FUN! H.Merijn
      What is // doing in the above code?

      // in this case is the Logical Defined Or (as opposed to the empty regular expression, for example in "$foo =~ //;"). The expression $data{$key}{$file} // 0 is the same as defined($data{$key}{$file}) ? $data{$key}{$file} : 0

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11119284]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2021-09-23 20:45 GMT
Find Nodes?
    Voting Booth?

    No recent polls found