http://qs321.pair.com?node_id=11108401


in reply to Fastest split possible

Where did the string come from in the first place?

If the carriage returns are of a predictable format (always \n, or always \r\n) you can use index to find the next one. This will add code complexity, and pull more work into your code, and less out of Perl's underlying C implementation. But it will avoid invoking the regex engine. Which one wins would be up to how you code it, and also really up to that tradeoff of doing more work in Perl versus doing more work in perl.

On the other hand, if you're reading this big string in as a file, your record separator $/ and <$fh> should handle it.

That leads to one additional approach:

open my $memfh, '<', \$string; chomp(my @array = <$memfh>);

Again not sure that will be faster, but an in-memory filehandle to a scalar is a nice little trick. It definitely puts more of the work into perl and less into Perl.

Update: I've since done my own benchmarks. The ones shown elsewhere in this thread are indicative of the results I was getting. The gist is that it's pretty hard to beat split. Keep in mind though, that if you're splitting a huge string into an array it's possible you're blowing up memory, and this would happen whether you use split, or some other solution. Back to my solution of opening an in-memory handle; it seems to be slower than split considerably. It my be neat using the technique to get an easy iterator for a string, but it's not going to be faster than split.


Dave

Replies are listed 'Best First'.
Re^2: Fastest split possible
by AnomalousMonk (Archbishop) on Nov 06, 2019 at 20:37 UTC

    That's simpler and probably faster than what I had in mind here. Combined with an idea from Discipulus:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $consonants = qr{ [^aeiouAEIOU]+ }xms; ;; my $s = qq{four score\nand seven\nyears ago\nour fathers\n}; print qq{>$s<}; ;; open my $fh, '<', \$s or die qq{opening in-memory: $!}; ;; process() while <$fh>; ;; sub process { chomp; s{ ($consonants) }{\U$1}xmsg; print qq{>$_<}; } " >four score and seven years ago our fathers < >FouR SCoRe< >aND SeVeN< >YeaRS aGo< >ouR FaTHeRS<


    Give a man a fish:  <%-{-{-{-<