http://qs321.pair.com?node_id=1193599

Preetham has asked for the wisdom of the Perl Monks concerning the following question:

Hi i have a below program to compare two file and log the result in new file. Its taking more than 10 min to log the data to AuditNew.txt file. br.1 file have around 3 lakh lines and MSMLogs.txt will have around 7,000 line.

Is there is any way to get the result logged faster in AuditNew.txt.

Note : Below print is displayed in console every 1 seconds for each line. it means the processing is faster but writing the data to text file is getting delayed.

print("\nnumber of occurance is $count\n"); #!/usr/bin/perl use 5.010; use strict; use warnings; #Open MSM Log in read Mode open(my $MSMLog, '<','MSMLogs.txt'); #Create Audit txt file in write mode open(my $Audit, '>','br.1'); print("Task Started.........\n"); #iterate each word to identify the logs while (my $row = <$MSMLog>) { chomp $row; getCount($row); } sub getCount { #Open MobileService.log file in read mode open(my $MobileServiceLog, '<','one.txt'); my @StaticLog = @_; my $count = 0; #print ("\nvalue in MSM Static is ------ $StaticLog[0]\n"); while (my $row = <$MobileServiceLog>) { my @actualWord = split /;/, $row; my $MobileService = $actualWord[7]; #print ("value in MobileService is ------ $MobileService\n"); if ($MobileService =~ /$StaticLog[0]/) { $count += 1; } } $_=1 print ($Audit "\n$StaticLog[0] occurance is ------- \t$count\n"); print("\nnumber of occurance is $count\n"); close $MobileServiceLog; } print ($Audit "Task Completed"); print ("Task Completed"); close $MSMLog; close $Audit;

Replies are listed 'Best First'.
Re: Delay in Writing the data to Text file
by kennethk (Abbot) on Jun 26, 2017 at 17:11 UTC
    This is largely consistent with haukex's response, but more verbose. Note that your posted script does not pass strict as written, and so does not run as posted. It's considered poor form to post code that doesn't compile; it's far better to post ugly code that behaves like what is on your system.

    As soon as your challenge is "my code is too slow," your first thought should be to profile your code. I'm a fan of Devel::NYTProf, but there are alternatives. If you ran a profiler, you'd likely see that the while loop in getCount takes up most of your time, for reasons haukex pointed out. In order to speed things up, you want to only read in each file once. There are two basic options for doing this:

    1. Prepocess/Store all the relevant data from $MobileServiceLog in memory, and loop over $MSMLog
    2. Prepocess/Store all the relevant data from $MSMLog in memory, and loop over $MobileServiceLog
    You'd generally pick one or the other based upon which one takes more memory. Assuming both files are small, the minimal way of making that happen might be
    print("\nnumber of occurance is $count\n"); #!/usr/bin/perl use 5.010; use strict; use warnings; #Open MSM Log in read Mode open(my $MSMLog, '<','MSMLogs.txt'); #Create Audit txt file in write mode open(my $Audit, '>','br.1'); print("Task Started.........\n"); #iterate each word to identify the logs while (my $row = <$MSMLog>) { chomp $row; getCount($row); } { my @rows; sub getCount { #Open MobileService.log file in read mode if (not @rows) { open(my $MobileServiceLog, '<','one.txt'); @rows = <$MobileServiceLog>; close $MobileServiceLog; } my @StaticLog = @_; my $count = 0; #print ("\nvalue in MSM Static is ------ $StaticLog[0]\n"); for my $row (@rows) { my @actualWord = split /;/, $row; my $MobileService = $actualWord[7]; #print ("value in MobileService is ------ $MobileService\n"); if ($MobileService =~ /$StaticLog[0]/) { $count += 1; } } $_=1 print ($Audit "\n$StaticLog[0] occurance is ------- \t$count\n"); print("\nnumber of occurance is $count\n"); } } print ($Audit "Task Completed"); print ("Task Completed"); close $MSMLog; close $Audit;
    where I've used a block to keep @rows reasonably scoped. Note that this still has many inherited flaws, such as not passing strict and not testing your opens for success. There are also a number of other potential optimizations, particularly depending on what you intend by if ($MobileService =~ /$StaticLog[0]/) {, such as using index or hashes.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Delay in Writing the data to Text file
by haukex (Archbishop) on Jun 26, 2017 at 16:40 UTC

    Unfortunately your post is hard to read, please use <code> tags to format it, see How do I post a question effectively? You also don't seem to have provided any sample input or expected output, please see Short, Self-Contained, Correct Example.

    From what I can understand, once per line of input from MSMLogs.txt, you're opening the file one.txt and scanning the entire file, this is probably where the bad performance is coming from. You might consider reading one.txt once into memory (if it's not too big), perhaps storing in in a hash to make access to it faster.

Re: Delay in Writing the data to Text file
by holli (Abbot) on Jun 26, 2017 at 17:09 UTC
    You are reading MSMLogs.txt everytime, you read ONE line in br.1. So given 7K lines in that file and 3K line the other that makes 21,000,000 read operations + 3K times opening and closing the file. Of course that is slow.

    Put the contents of br.1 into an array and loop over that.


    holli

    You can lead your users to water, but alas, you cannot drown them.
Re: Delay in Writing the data to Text file
by Anonymous Monk on Jun 26, 2017 at 17:01 UTC
    $MobileService =~ /$StaticLog[0]/
    Don't do this if you really mean $MobileService eq $StaticLog[0]
Re: Delay in Writing the data to Text file
by AnomalousMonk (Archbishop) on Jun 26, 2017 at 22:33 UTC
    ... around 3 lakh ...

    N.B.: The unit of a lakh == 100,000.


    Give a man a fish:  <%-{-{-{-<