comment on

This may not be representative, but a simple test shows that regexp could be much much faster here:


use Benchmark ();

our @data;
my $line = '1.000000     '
. '    100.273    121.54     98.169    121.58'
. '    100.273    121.54     98.169    121.58'
. '    100.273    121.54     98.169    121.58'
. '    100.273    121.54     98.169    121.58';

Benchmark::cmpthese(0, {
   split        => sub { @data = split(/\s+/, $line) },
   fixed_length => sub { @data = $line =~ /^.{8} {6}(.{10})(.{10})(.{1
+0})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})$/ 
+},
   var_length   => sub { @data = $line =~ /^.{8}\s+(\S+)\s+(\S+)\s+(\S
++)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+
+(\S+)$/ },
});

__END__
                 Rate        split   var_length fixed_length
split         63116/s           --         -30%         -87%
var_length    90310/s          43%           --         -81%
fixed_length 482454/s         664%         434%           --
[download]

Of course, fixed_length would assume that you do your own joining, since join would not preserve field widths.

In reply to Re: Speed of Split by ikegami
in thread Speed of Split by Lexicon

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Don't ask to ask, just ask
	PerlMonks