Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Reduce CPU utilization time in reading file using perl

by BrowserUk (Patriarch)
on Sep 27, 2013 at 16:26 UTC ( [id://1056018]=note: print w/replies, xml ) Need Help??


in reply to Reduce CPU utilization time in reading file using perl

Using Tie::File on such a huge file -- or any file over a few (single digit) megabytes is stupid. It will use huge amounts of cpu and be very slow.

You can build your hash much (much, much) more quickly this way:

open BIGFILE, '<', "testfile.dat" or die "Can't open file: $!\n"; my %hash; while( <BIGFILE> ) { chomp; my( $type, $No, $date ) = split(/\|/); $hash{$No.$date} = $type."@".$No."@".$date; } close BIGFILE; ## do something with the hash.

Will use far less cpu & memory and complete in less than 1/2 the time.

However, it is really doubtful that you will be able to build a hash from that size of file without running out of memory unless:

  • there are huge numbers of duplicate records in that file.
  • You have a machine that has huge amounts of memory.
  • You have a huge swap partition. (Preferably sited on a SSD).

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Reduce CPU utilization time in reading file using perl
by madtoperl (Hermit) on Sep 28, 2013 at 13:18 UTC
    Hi BrowserUK,
    Thanks a lot for your inputs. I have tried your option as well, sitll the CPU usage is 100%.Is it possible to load only one line into memory from the huge file and store it into hash or without opening directly possible to store it into hash. I am worrying that it may not be very less possible. Still thought of getting your suggestion.
    Thanks
    madtoperl
      I have tried your option as well, sitll the CPU usage is 100%.

      That is because you are using more memory for the hash than you have installed, thus, some parts of the memory holding the hash are being swapped or paged to disk as the file is being read. The nature of the way hashes are stored means that pages of memory are constantly being written to disk and then re-read, over and over; and that is what is driving up your cpu usage.

      Is it possible to load only one line into memory from the huge file and store it into hash or without opening directly possible to store it into hash.

      That is what my code does. Read's one line installs into the hash then reads the next. It is the size of the hash that is the problem, not the line-by-line processing of the file.

      I am worrying that it may not be very less possible. Still thought of getting your suggestion.

      There are various ways of providing access to huge amount of data without requiring that it all be held in memory concurrently. Which of those methods/mechanisms is appropriate for your purpose depends entirely upon what you need to do with that data.

      So, before advising further, you need to answer the question: Why are you attempting to load all the data into a hash?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      p
        Hi BrowserUk
        Thanks a lot for the help.I have two huge files and need to compare the difference between those two files by column wise and write the difference of the corresponding column and row where the mismatch is into a third file. The lines of both the files are delimited using |.Could you please suggest the better option for this. Right now, I am loading the two files data into two separate hash and compare it and write it into the third file. It would be ood if you can suggest something other than loading the file content into database and fetching it.
        Thanks,
        madtoperl

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1056018]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-03-29 14:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found