Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Memory utilization and hashes

by Laurent_R (Canon)
on Jan 17, 2018 at 21:27 UTC ( [id://1207432]=note: print w/replies, xml ) Need Help??


in reply to Memory utilization and hashes

I don't understand your code.
while (<>) { $l = $_; chomp $; # you probably want to +chomp $l, or possibly $_ (but you no longer use $_), but not $ @vals = split /;/, $l; # you split your line i +nto @vals, but no longer use that variable. Besides, # declaring @vals with +my would be good practice if ($l =~ /Query/) { # you could use somethi +ng like: if $vals[0] eq "Query" %pairs{$l[1]}{$l[2]} = $l[3]; # where are $l[1], $l[2 +] and $l[3] coming from? Also, %pairs{...} is probably a syntax error +. } elsif {$l =~ /Answer/) { # again, you could use: + if $vals[0] eq "Answer". Also, "elsif {..." is a syntax error. %pairs{$l[1}{$l[2]} = $l[3]; # again, where are $l[1 +], $l[2] and $l[3] coming from? Also a syntax error. $json = encode_json $pairs{$l[1]}; # given the previous co +de, I doubt that you really want to encode $pairs{$l[1]} print $json."\n"; # is you intent to prin +t to the screen? delete $pairs{$l[1]}; # not sure it's needed, + since you just reuse the same variable in the next iteration } }
Also, I don't understand what's going on when you have two queries or two answers in a row, as in your data example.

With the code you're showing, the hash should not grow significantly, even without the call to delete. (Update:: but this is no longer true with the updated code posted below.)

Replies are listed 'Best First'.
Re^2: Memory utilization and hashes
by bfdi533 (Friar) on Jan 17, 2018 at 21:32 UTC

    Sorry for the typos in the code; fixing them.

    My actual data consists of data from several hundred MB to several hundred GB so that sample data set is just a sample of the sort of thing I am processing.

    The two queries and two answers in a row is what my real world data contains, specifically there can be anywhere from 1 to n answers for each query and the queries and answers occur in any order and the only guarantee is that the answer will follow (sometime later) the query it goes with.

    Max rows in files to process = 31291204, average lines in files 8707186.

      Just to keep track:
      my $l; # all these three variables should probably better decla +red within the my @vals; # while loop. Only %pairs probably need to be declared b +efore the while my $json; while (<>) { $l = $_; chomp $l; @vals = split /;/, $l; if ($vals[0] =~ /Query/) { $pairs{$vals[1]}{$vals[2]} = $vals[3]; # %pairs isn't decla +red anywhere } elsif {$vals[0] =~ /Answer/) { # syntax error: elsi +f { should be elsif ( $pairs{$vals[1}{$vals[2]} = $vals[3]; $json = encode_json $pairs{$vals[1]}; # what do you think +is the content of $pairs{$vals[1]}? Probably not what you want to enc +ode. print $json."\n"; delete $pairs{$vals[1]}; } }
      This will still not compile.

      Do yourself a favor. Use the following pragmas:

      use strict; use warnings;
      specifically there can be anywhere from 1 to n answers for each query
      Then you can't delete your hash entries as you go, because when a second answer comes of a given query, you no longer have the information from the query available.

        See my last post at the end. I check for a repeat and when I detect it, I delete it and start over.

        Update:

        Hmmm, thinking about what you said and what I just said ...

        Maybe deleting the entry and starting over (though needed to get rid of the last answers) does not really solve my memory issue at all. You might actually be onto something there but I do not know how to fix it, given that it is the case ...

      Even with the fixes that you did in the original post, you still have several syntax errors.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1207432]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (10)
As of 2024-04-19 08:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found