Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Reduce CPU utilization time in reading file using perl

by Laurent_R (Canon)
on Sep 27, 2013 at 17:05 UTC ( [id://1056025] : note . print w/replies, xml ) Need Help??

in reply to Reduce CPU utilization time in reading file using perl

I agree with BrowserUK that loading such a huge file into memory even before you start reading the first line is a no go. You really want to iterate line by line over the file to reduce the memory used. Having said that, it is unlikely you will be able to fit 5 million megabytes into a hash.

On your CPU usage question, how many CPUs/cores do you have?

  • Comment on Re: Reduce CPU utilization time in reading file using perl

Replies are listed 'Best First'.
Re^2: Reduce CPU utilization time in reading file using perl
by madtoperl (Hermit) on Sep 28, 2013 at 13:21 UTC
    Hi Laurent R<br. Thanks a lot. I have only one CPU. Why the CPU usage says 23% in one place and 100% in other place. Is my script the CPU's whole 100% while running my script or only 23% of the whole CPU's time. Could you please clarify.

      Hi, my question was: how many CPUs/cores (not just CPUs) do you have. Even if you have only one (e.g. Intel) CPU, but with, say, five cores, your process might very well take more or less 100% of one core's processing power but still leave the 4 other cores almost completely idle. Since you are not forking subprocesses in your program nor using any threads, your process can basically only use one core (the system itself might be able to delegate a small fraction of its own work to another core, but this is likely to be very limited). So you might very well use 100% of one core's processing power, but only 20 or 25% of the CPU total processing power.

        Hi Laurent R
        Thanks a lot for your clarification. My CPU is 4 core so it looks like only 1 core is used for this process and that is why it is showing around 23% out of total 100% capacity of the CPU. I need to read two large files and find the difference based on column and store the differnces in the third file. Can I use threads here to read this huge file and store it into a hash. Or is there there any other better way to handle this. Please suggest.