note
marioroy
<p><b>Update:</b> Added regex example.</p>
<p>Hello all :) I tried trimming the left side of the string. Plus incorporated the memfh example by [davido]. I'm curious too.</p>
<p>Split improves ~ 2x faster for this demonstration with Perl 5.22.x and later releases. That is the case on macOS.</p>
<code>
use strict;
use warnings;
use Time::HiRes 'time';
my $huge_string = "aaa bbb\nccc ddd\neee fff\nggg hhh\niii jjj\nkkk lll\nmmm nnn\n";
# concatenate string exponentially to 917,504 lines
$huge_string .= $huge_string for 1..17;
# memfh
{
my $string = $huge_string;
my $start = time;
open my $memfh, '<', \$string;
my @lines = <$memfh>;
close $memfh;
printf "duration memfh: %0.3f seconds\n", time - $start;
printf "%d lines\n\n", scalar(@lines);
}
# regex
{
my $string = $huge_string;
my $start = time;
my @lines;
while ( $string =~ /([^\n]+\n)/mg ) {
my $line = $1; # save $1 to not lose the value
push @lines, $line;
}
printf "duration regex: %0.3f seconds\n", time - $start;
printf "%d lines\n\n", scalar(@lines);
}
# split
{
my $string = $huge_string;
my $start = time;
my @lines = split(/\n/, $string);
printf "duration split: %0.3f seconds\n", time - $start;
printf "%d lines\n\n", scalar(@lines);
}
# trim
{
my $string = $huge_string;
my $start = time;
my @lines;
while ( my $line = substr($string, 0, index($string, "\n") + 1, '') ) {
push @lines, $line;
}
printf "duration trim : %0.3f seconds\n", time - $start;
printf "%d lines\n\n", scalar(@lines);
}
</code>
<p>Output - Perl 5.28.2</p>
<code>
duration memfh: 0.384 seconds
917504 lines
duration regex: 0.387 seconds
917504 lines
duration split: 0.067 seconds
917504 lines
duration trim : 0.201 seconds
917504 lines
</code>
<p>Another machine - Perl 5.26.1</p>
<code>
duration memfh: 0.477 seconds
917504 lines
duration regex: 0.445 seconds
917504 lines
duration split: 0.065 seconds
917504 lines
duration trim : 0.259 seconds
917504 lines
</code>
<p>Same machine - Perl 5.18.2</p>
<code>
duration memfh: 0.530 seconds
917504 lines
duration regex: 0.490 seconds
917504 lines
duration split: 0.130 seconds
917504 lines
duration trim : 0.261 seconds
917504 lines
</code>
<p>Regards, Mario</p>
11108396
11108396