Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Perl to run 2 files and print the third with loop

by EBK (Sexton)
on Apr 01, 2018 at 04:46 UTC ( [id://1212093]=perlquestion: print w/replies, xml ) Need Help??

EBK has asked for the wisdom of the Perl Monks concerning the following question:

Let me explain more clearly now. I ask some days ago about to join two files with conditional and loop. I got to solve partially the problem but I need to change these demand to the following situation.

File A with 9 or more lines and the second column is the key.
life_1,23032018,a_0300,true,3 life_1,23032018,a_0200,true,4 life_1,23032018,a_0100,true,3 life_1,23032018,c_0300,true,21 life_1,23032018,c_0200,true,21 life_1,23032018,c_0100,true,25 life_1,23032018,d_0300,true,23 life_1,23032018,d_0200,true,21 life_1,23032018,d_0100,true,24
File B with 800 or more lines and the second column is the key.
201810021569661,23032018 201810021569678,23032018 201810021569685,23032018 201810021569708,23032018 201810021569715,23032018 201810021569722,23032018 201810021569739,23032018 201810021569746,23032018 201810021569753,23032018 201810021569760,23032018
I am using Perl in the File A and using the 5th column to run through the number of lines of the File B and creating a third file. But when the for loop of the 5th column from File A finish I have to stop in the line in the File B and continue the next loop in the same line and not to restart from the First line in File B. The result I got with my script is. The second column values are repeated on each loop. For example, the value 201810021569661 is showing 3 times in my result.
Loop tot->3 life_1,201810021569661,a_0300 life_1,201810021569678,a_0300 life_1,201810021569685,a_0300 Loop tot->4 life_1,201810021569661,a_0200 life_1,201810021569678,a_0200 life_1,201810021569685,a_0200 life_1,201810021569739,a_0200 Loop tot->3 life_1,201810021569661,a_0100 life_1,201810021569678,a_0100 life_1,201810021569685,a_0100
The result I want is this. Look the result is the second column does not repeat. Now the value 201810021569661 is showing only one time.
Loop tot->3 life_1,201810021569661,a_0300 life_1,201810021569678,a_0300 life_1,201810021569685,a_0300 Loop tot->4 life_1,201810021569708,a_0200 life_1,201810021569715,a_0200 life_1,201810021569722,a_0200 life_1,201810021569739,a_0200 Loop tot->3 life_1,201810021569746,a_0100 life_1,201810021569753,a_0100 life_1,201810021569760,a_0100
With the help of all of you, I did something but it is restarting the File B on each loop.
use strict; use warnings; use Data::Dumper qw(Dumper); my $filea = FileA; my $fileb = FileB; open ( FA, '<', $filea) || die ( "File $filea Not Found!" ); open ( FB, '<', $fileb) || die ( "File $fileb Not Found!" ); my %ts; while ( <FB> ) { chomp; my($ids, $timestamp) = split ","; push @{ $ts{$timestamp} }, $ids; } while ( <FA> ) { chomp; my($life,$timestamp,$cls,$bool,$tot) = split ","; print STDERR "Loop tot-> $tot"; scalar <STDIN>; for my $ids ( @{ $ts{$timestamp} } ) { last if $tot-- < 1; print join(",",$life, $ids, $cls )."\n"; } }

Replies are listed 'Best First'.
Re: Perl to run 2 files and print the third with loop
by jimpudar (Pilgrim) on Apr 01, 2018 at 07:19 UTC

    Hi EBK,

    You are quite close to the solution. Instead of looping over the array, try looping over the range from 1 to $tot.

    Code looks like this:

    use strict; use warnings; use Data::Dumper qw(Dumper); my $filea = "FileA"; my $fileb = "FileB"; open ( FA, '<', $filea) || die ( "File $filea Not Found!" ); open ( FB, '<', $fileb) || die ( "File $fileb Not Found!" ); my %ts; while ( <FB> ) { chomp; my($ids, $timestamp) = split ","; push @{ $ts{$timestamp} }, $ids; } while ( <FA> ) { chomp; my($life,$timestamp,$cls,$bool,$tot) = split ","; print STDERR "Loop tot-> $tot"; scalar <STDIN>; foreach ( 1 .. $tot ) { my $id = shift @{ $ts{$timestamp} }; print join(",",$life, $id, $cls )."\n"; } }

    Using the shift operator ensures that no id is repeated.

    Hope this helps!

    Jim

      There's really no point in copying File B to a one-element hash, in fact it can be read line by line.

        Yeah, I was thinking the same thing but I figured the smaller change would be easier to digest. However, it wouldn't be a one element hash, as I assume there will be more timestamps and we only have the beginning of the data.

        EBK, if your File B ends up being much larger (say a few 10s of GB) you would want to put the first loop inside the second one so that you can read it in line by line and not waste RAM by putting the data in the hash. However for such a small file like 800 lines I don't think it is worth the effort.

        Also, with this solution as someone else pointed out, if the totals are larger than the actual amount of lines of data, this will cause an undef $id and lots of warnings. I am assuming that the two files are consistent with each other.

        Jim

        Was thinking about this a bit, and it seems based on the data we have been given by EBK, that the timestamps really don't matter at all. Perhaps if we could see the entirety of File B, this would be cleared up.

        That being said, here is a more simple (or perhaps more cryptic) solution which doesn't rely on storing all of File B in memory:

        #!/usr/bin/perl -wlaF, BEGIN { open FH, '<', 'FileB' } print "Loop tot-> $F[4]"; for ( 1 .. $F[4] ) { @G = split ',', <FH>; print join ',', $F[0], $G[0], $F[2]; }

        If you save this script as 'run.pl', you can call it like so:

        chmod 740 run.pl ./run.pl <FileA

        The -a option turns on autosplit mode and the -F, option sets the field delimiter to a comma.

        EBK, I'm curious to know whether this will work for you. Please let me know!

        Best,

        Jim

      It's worked. It is exactly I wanted. How does the shift operator work?

        shift or  perldoc -f shift from your command line. (Update: All Perl built-ins like shift are documented in perlfunc.)


        Give a man a fish:  <%-{-{-{-<

Re: Perl to run 2 files and print the third with loop
by mr_ron (Chaplain) on Apr 01, 2018 at 18:29 UTC

    Like jimpudar I got the OP code to work with small changes. I am just a little worried that in jimpudar approach we are not checking for the possibility that @{ $ts{$timestamp} } might become empty which the code below checks:

    while ( <FA> ) { ... while ( $tot-- and defined(my $id = shift @{ $ts{$timestamp} }) ){ print join(",",$life, $id, $cls )."\n"; } }
    Ron
Re: Perl to run 2 files and print the third with loop
by haukex (Archbishop) on Apr 01, 2018 at 10:15 UTC

    Crossposted to StackOverflow. Crossposting is acceptable, but it is considered polite to inform about it so that efforts are not duplicated.

Re: Perl to run 2 files and print the third with loop
by Anonymous Monk on Apr 01, 2018 at 21:56 UTC
    It looks rather to me like you are doing a database join the hard way. Why not just shove all of these files into an SQLite database (file), then do whatever queries you like?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1212093]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-16 15:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found