Perl to run 2 files and print the third with loop

EBK has asked for the wisdom of the Perl Monks concerning the following question:

Let me explain more clearly now. I ask some days ago about to join two files with conditional and loop. I got to solve partially the problem but I need to change these demand to the following situation.

File A with 9 or more lines and the second column is the key.

    life_1,23032018,a_0300,true,3
    life_1,23032018,a_0200,true,4
    life_1,23032018,a_0100,true,3
    life_1,23032018,c_0300,true,21
    life_1,23032018,c_0200,true,21
    life_1,23032018,c_0100,true,25
    life_1,23032018,d_0300,true,23
    life_1,23032018,d_0200,true,21
    life_1,23032018,d_0100,true,24
[download]

File B with 800 or more lines and the second column is the key.

    201810021569661,23032018
    201810021569678,23032018
    201810021569685,23032018
    201810021569708,23032018
    201810021569715,23032018
    201810021569722,23032018
    201810021569739,23032018
    201810021569746,23032018
    201810021569753,23032018
    201810021569760,23032018
[download]

I am using Perl in the File A and using the 5th column to run through the number of lines of the File B and creating a third file. But when the for loop of the 5th column from File A finish I have to stop in the line in the File B and continue the next loop in the same line and not to restart from the First line in File B. The result I got with my script is. The second column values are repeated on each loop. For example, the value 201810021569661 is showing 3 times in my result.

    Loop tot->3
    life_1,201810021569661,a_0300
    life_1,201810021569678,a_0300
    life_1,201810021569685,a_0300
    Loop tot->4
    life_1,201810021569661,a_0200
    life_1,201810021569678,a_0200
    life_1,201810021569685,a_0200
    life_1,201810021569739,a_0200
    Loop tot->3
    life_1,201810021569661,a_0100
    life_1,201810021569678,a_0100
    life_1,201810021569685,a_0100
[download]

The result I want is this. Look the result is the second column does not repeat. Now the value 201810021569661 is showing only one time.

    Loop tot->3
    life_1,201810021569661,a_0300
    life_1,201810021569678,a_0300
    life_1,201810021569685,a_0300
    Loop tot->4
    life_1,201810021569708,a_0200
    life_1,201810021569715,a_0200
    life_1,201810021569722,a_0200
    life_1,201810021569739,a_0200
    Loop tot->3
    life_1,201810021569746,a_0100
    life_1,201810021569753,a_0100
    life_1,201810021569760,a_0100
[download]

With the help of all of you, I did something but it is restarting the File B on each loop.

    use strict;
    use warnings;
    use Data::Dumper qw(Dumper);

    my $filea = FileA;
    my $fileb = FileB;

    open ( FA, '<', $filea) || die ( "File $filea Not Found!" );
    open ( FB, '<', $fileb) || die ( "File $fileb Not Found!" );

    my %ts;
    while ( <FB> ) {
        chomp;
        my($ids, $timestamp) = split ",";
        push @{ $ts{$timestamp} }, $ids;
    }
    
    while ( <FA> ) {
        chomp;
        my($life,$timestamp,$cls,$bool,$tot) = split ",";
        print STDERR "Loop tot-> $tot"; scalar <STDIN>;
            for my $ids ( @{ $ts{$timestamp} } ) {
                    last if $tot-- < 1;
                    print join(",",$life, $ids, $cls )."\n";
            }
     }
[download]

Comment on Perl to run 2 files and print the third with loop Select or Download Code

Replies are listed 'Best First'.
Re: Perl to run 2 files and print the third with loop by jimpudar (Pilgrim) on Apr 01, 2018 at 07:19 UTC
Hi EBK, You are quite close to the solution. Instead of looping over the array, try looping over the range from 1 to $tot. Code looks like this: use strict; use warnings; use Data::Dumper qw(Dumper); my $filea = "FileA"; my $fileb = "FileB"; open ( FA, '<', $filea) \|\| die ( "File $filea Not Found!" ); open ( FB, '<', $fileb) \|\| die ( "File $fileb Not Found!" ); my %ts; while ( <FB> ) { chomp; my($ids, $timestamp) = split ","; push @{ $ts{$timestamp} }, $ids; } while ( <FA> ) { chomp; my($life,$timestamp,$cls,$bool,$tot) = split ","; print STDERR "Loop tot-> $tot"; scalar <STDIN>; foreach ( 1 .. $tot ) { my $id = shift @{ $ts{$timestamp} }; print join(",",$life, $id, $cls )."\n"; } } [download] Using the shift operator ensures that no id is repeated. Hope this helps! Jim	[reply] [d/l]
Re^2: Perl to run 2 files and print the third with loop by Borodin (Sexton) on Apr 01, 2018 at 16:43 UTC
There's really no point in copying File B to a one-element hash, in fact it can be read line by line.	[reply]
Re^3: Perl to run 2 files and print the third with loop by jimpudar (Pilgrim) on Apr 01, 2018 at 23:51 UTC
Yeah, I was thinking the same thing but I figured the smaller change would be easier to digest. However, it wouldn't be a one element hash, as I assume there will be more timestamps and we only have the beginning of the data. EBK, if your File B ends up being much larger (say a few 10s of GB) you would want to put the first loop inside the second one so that you can read it in line by line and not waste RAM by putting the data in the hash. However for such a small file like 800 lines I don't think it is worth the effort. Also, with this solution as someone else pointed out, if the totals are larger than the actual amount of lines of data, this will cause an undef $id and lots of warnings. I am assuming that the two files are consistent with each other. Jim	[reply]
Re^3: Perl to run 2 files and print the third with loop by jimpudar (Pilgrim) on Apr 02, 2018 at 06:35 UTC
Was thinking about this a bit, and it seems based on the data we have been given by EBK, that the timestamps really don't matter at all. Perhaps if we could see the entirety of File B, this would be cleared up. That being said, here is a more simple (or perhaps more cryptic) solution which doesn't rely on storing all of File B in memory: `#!/usr/bin/perl -wlaF, BEGIN { open FH, '<', 'FileB' } print "Loop tot-> $F[4]"; for ( 1 .. $F[4] ) { @G = split ',', <FH>; print join ',', $F[0], $G[0], $F[2]; }` [download] If you save this script as 'run.pl', you can call it like so: `chmod 740 run.pl ./run.pl <FileA` [download] The -a option turns on autosplit mode and the -F, option sets the field delimiter to a comma. EBK, I'm curious to know whether this will work for you. Please let me know! Best, Jim	[reply] [d/l] [select]
Re^4: Perl to run 2 files and print the third with loop by EBK (Sexton) on Apr 02, 2018 at 17:29 UTC
Re^2: Perl to run 2 files and print the third with loop by EBK (Sexton) on Apr 01, 2018 at 21:45 UTC
It's worked. It is exactly I wanted. How does the shift operator work?	[reply]
Re^3: Perl to run 2 files and print the third with loop by AnomalousMonk (Archbishop) on Apr 01, 2018 at 21:48 UTC
shift or `perldoc -f shift` from your command line. (Update: All Perl built-ins like `shift` are documented in perlfunc.) Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: Perl to run 2 files and print the third with loop by kcott (Archbishop) on Apr 02, 2018 at 04:52 UTC
Re: Perl to run 2 files and print the third with loop by mr_ron (Chaplain) on Apr 01, 2018 at 18:29 UTC
Like jimpudar I got the OP code to work with small changes. I am just a little worried that in jimpudar approach we are not checking for the possibility that `@{ $ts{$timestamp} }` might become empty which the code below checks: `while ( <FA> ) { ... while ( $tot-- and defined(my $id = shift @{ $ts{$timestamp} }) ){ print join(",",$life, $id, $cls )."\n"; } }` [download] Ron	[reply] [d/l] [select]
Re: Perl to run 2 files and print the third with loop by haukex (Archbishop) on Apr 01, 2018 at 10:15 UTC
Crossposted to StackOverflow. Crossposting is acceptable, but it is considered polite to inform about it so that efforts are not duplicated.	[reply]
Re: Perl to run 2 files and print the third with loop by Anonymous Monk on Apr 01, 2018 at 21:56 UTC
It looks rather to me like you are doing a database join the hard way. Why not just shove all of these files into an SQLite database (file), then do whatever queries you like?	[reply]


Syntactic Confectionery Delight
	PerlMonks