I think it's time for a benchmark here:
Using perl 5.8.0, on Linux (Mandrake 9.0) on a rather fast machine (Athlon dual-processor 1.8):
#!/bin/perl -w
use strict;
use Benchmark( 'cmpthese');
use Encode;
use Text::Iconv;
use Unicode::Map8;
use Unicode::String qw(utf8);
use utf8;
my $enc= 'latin1';
my $convert_iconv = Text::Iconv->new( 'utf8', $enc);
my $convert_unicode = Unicode::Map8->new ($enc);
my $text= <DATA>;
chomp $text;
# lets just check the output!
print "Encode : ", encode("iso-8859-1", $text), "\n";
print "Text::Iconv : ", $convert_iconv->convert( $text), "\n";
print "Unicode::Map8 : ", $convert_unicode->to8 (utf8($text)->ucs2), "
+\n";
print "regexp : ", latin1( $text), "\n";
# now benchmark
cmpthese( 500000, {
'Encode' => sub { encode("iso-8859-1", $text);
+ },
'Text::Iconv' => sub { $convert_iconv->convert( $text
+); },
'Unicode::Map8' => sub { $convert_unicode->to8 (utf8($t
+ext)->ucs2); },
'regexp' => sub { latin1( $text);
+ },
});
sub latin1
{ my $text=shift;
$text=~s{([\xc0-\xc3])(.)}{ my $hi = ord($1);
my $lo = ord($2);
chr((($hi & 0x03) <<6) | ($lo & 0x3F))
}ge;
return $text;
}
__DATA__
texte soupçonné d'être plein de caractÚres accentués
Results:
Encode : texte soupçonné d'être plein de caractères accentués
Text::Iconv : texte soupçonné d'être plein de caractères accentués
Unicode::Map8 : texte soupçonné d'être plein de caractères accentués
regexp : texte soupçonné d'être plein de caractères accentués
Benchmark: timing 500000 iterations of Encode, Text::Iconv, Unicode::M
+ap8, regexp...
Encode: 6 wallclock secs ( 4.91 usr + 0.02 sys = 4.93 CPU) @
+ 101419.88/s (n=500000)
Text::Iconv: 2 wallclock secs ( 2.20 usr + 0.00 sys = 2.20 CPU) @
+ 227272.73/s (n=500000)
Unicode::Map8: 7 wallclock secs ( 7.66 usr + 0.00 sys = 7.66 CPU) @
+ 65274.15/s (n=500000)
regexp: 6 wallclock secs ( 5.65 usr + 0.01 sys = 5.66 CPU) @
+ 88339.22/s (n=500000)
Rate Unicode::Map8 regexp Encode Tex
+t::Iconv
Unicode::Map8 65274/s -- -26% -36%
+ -71%
regexp 88339/s 35% -- -13%
+ -61%
Encode 101420/s 55% 15% --
+ -55%
Text::Iconv 227273/s 248% 157% 124%
+ --
Note: I am not an expert in using Benchmark, so please let me know if my test is flawed.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.