Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Benchmark comparison of Text::xSV and Text::CSV_XS

by jZed (Prior)
on Dec 09, 2004 at 02:52 UTC ( [id://413409]=perlmeditation: print w/replies, xml ) Need Help??

An esteemed monk (who shall remain nameless) posted recently that he had benchmarked Text::xSV and Text::CSV_XS and found them to be comparable in speed. He did not include his benchmark tests. While I have great respect for Tilly's Text::xSV, I found it hard to believe that a pure perl module would be as fast as an XS module at CSV tasks. Running the benchmarks below, I find that Text::CSV_XS is approximately three times faster than Text::xSV at both writing and reading CSV files. The results on my 800mhz pentium III running debian sarge were as follows:
  Compare file creation with Text::xSV and Text::CSV_XS
        Rate  xSV  CSV
  xSV 11.6/s   -- -62%
  CSV 30.3/s 161%   --

  Compare file reading with Text::xSV and Text::CSV_XS
        Rate  xSV  CSV
  xSV 3.49/s   -- -73%
  CSV 12.8/s 266%   --
But my benchmark fu is not very strong. I would appreciate it if others could look at the benchmark and suggest any improvements, or, if I've made some horrible mistake, to please point it out.

Disclaimer : I'm currently the maintainer of Text::CSV_XS, but it was written by Jochen Wiedmann and all credit goes to him. And again, this isn't a dispargement of the excellent Text::xSV, it has a nicer interface and the features of the two modules overlap in the sense that each does things the other doesn't. Both can use alternate record separators and can handle embedded newlines and commas.
#!perl -w use strict; use Text::CSV_XS; use Text::xSV; use IO::File; use Benchmark qw(:all); my($cols,$data,$rows) = ( ['Name','City','Num'], [], 999 ); for my $num(0..$rows) { push @$data, ["myself\nme","Portland,Oregon",$num]; } print "\nCompare file creation with Text::xSV and Text::CSV_XS\n"; cmpthese( 50, { xSV => sub { create_xSV('test.xSV',$cols,$data) } , CSV => sub { create_CSV('test.CSV',$cols,$data) } } ); create_xSV('test.xSV',$cols,$data,'keep'); create_CSV('test.CSV',$cols,$data,'keep'); print "\nCompare file reading with Text::xSV and Text::CSV_XS\n"; cmpthese( 50, { xSV => sub { read_xSV('test.xSV') } , CSV => sub { read_CSV('test.CSV') } } ); sub create_xSV { my($fname,$cols,$data,$keep) = @_; my $fh = IO::File->new(">$fname") or die $!; my $csv = Text::xSV->new( fh=>$fh, header=>$cols); $csv->print_header(); $csv->print_row(@$_) for @$data; $fh->close; if (!$keep) { unlink $fname or die $!; } } sub create_CSV { my($fname,$cols,$data,$keep) = @_; my $csv = Text::CSV_XS->new({ binary => 1 }); my $fh = IO::File->new(">$fname") or die $!; $csv->print($fh,$cols); $fh->print("\n"); for (@$data){ $csv->print($fh,$_); $fh->print("\n"); } $fh->close; if (!$keep) { unlink $fname or die $!; } } sub read_xSV { my $fname = shift; my $fh = IO::File->new("$fname") or die $!; my $csv = Text::xSV->new( fh=>$fh, header=>$cols); my $count; $csv->read_header(); while ($csv->get_row()) { my @row = $csv->extract(qw(Name City Num)); die 'Bad Read' unless "@row" eq "@{$data->[$count++]}"; } die "Bad number of rows '$count'" unless $count == $rows+1; $fh->close; } sub read_CSV { my $fname = shift; my $fh = IO::File->new("$fname") or die $!; my $csv = Text::CSV_XS->new({binary=>1}); my $header = $csv->getline($fh); my $count=0; while ( my $columns = $csv->getline($fh) ) { last if !defined $columns->[0]; die 'Bad Read' unless "@$columns" eq "@{$data->[$count++]}"; } die "Bad number of rows '$count'" unless $count == $rows+1; $fh->close; } __END__

Replies are listed 'Best First'.
Re: Benchmark comparison of Text::xSV and Text::CSV_XS
by tilly (Archbishop) on Dec 09, 2004 at 04:51 UTC
    I have not benchmarked Text::xSV, but not only is it in pure Perl, but it was not written with an eye to efficiency. This is pointed out in the bugs section, as is the fact that its performance will get drastically worse if you ever use $`, $& or $'.

    Given that, I'd be very pleased if it was only 3 times slower than the XS version. However I doubt that I perform that well.

    First of all to improve the performance, your read_xSV would be more efficient if you used the fact that get_row() returns the row in array context, so you don't have to call extract. I encourage using the extract method because working "by name" tends to be less buggy than working "by position". OTOH since I'm encouraging people to use extract, you've written it in the way that most people would hopefully write that.

    And your benchmark should be rewritten with the error-checks commented out. I guarantee that those checks are taking a non-trivial fraction of the overall time. If you remove them, then I'd bet that the C version does a lot better still than the pure Perl version.

    UPDATE: Removed a redundant "is encouraged". Thanks, radiantmatrix.

Re: Benchmark comparison of Text::xSV and Text::CSV_XS
by hossman (Prior) on Dec 09, 2004 at 08:12 UTC

    The IO involved in opening/closing/reading/writing those files is probably where you are spending the most of your time in those benchmarks. Since you're doing the same amount of IO in both test, it should affect which one is faster, but it does affect the ratio of sped between the two tests (just like adding a sleep(1) to both makes them both equally sucky)

    If you really want to test *only* the CSV time, you can read from a tied filehandle that's really just a string in memory; and write to a tied filehanlde that just throws away the data you write to it.

    take a look at Tie::FileHandle::Base the replies to this post as a place to start

      What's wrong with IO::Scalar? I use it in tests all the time ...

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

        See, i was certain that module existed ... but i couldn't for the life of me remember what it was called, and my searches didn't turn up anything more promising then the base class.

        -thanx

      .. Or just open a scalarref, assuming a newish perl (5.8+ I believe) ..
      my $content = ''; open($fh, '<', \$content) ..
      C.

        I'd like to note that the open ..., "<", \ EXPR syntax is a shortcut for open ..., "<:scalar", \ EXPR. I found out once that if you specify any sort of PerlIO layer in the open-type format, you also have to explicitly use the long form otherwise you end up attempting to work with a file named "SCALAR(0x....)" instead of an in-memory file.

        # Ok open ..., "<", \ $content open ..., "<:scalar", \ $content open ..., "<:scalar:crlf", \ $content # Not ok open ..., "<:crlf", \ $content
Re: Benchmark comparison of Text::xSV and Text::CSV_XS
by radiantmatrix (Parson) on Dec 10, 2004 at 14:53 UTC

    After confirming with jZed that it was me who raised his ire, I feel the need to clarify. I benchmarked Text::xSV, Text::CSV, and Text::CSV_XS. At the time, I did not find Text::CSV_XS in ASPN (I was working in a Pure-Windows environment), so I compiled it myself using Borland's C compiler.

    My benchmark results on a 350MB file indicated that CSV_XS was 12% faster than Text::xSV for reads only (no testing on writes). However, there were far too many variables for this benchmark to be entirely accurate. Even so, I felt that compiling CSV_XS for Windows was not worth a mere 12% gain in read performance.

    Now, Text::CSV_XS is available from ASPN. I would be interested in seeing a Windows benchmark for the two modules, as I don't have time to build one at work (which is the only place I use Win32 right now).

    radiantmatrix
    require General::Disclaimer;
    s//2fde04abe76c036c9074586c1/; while(m/(.)/g){print substr(' ,JPacehklnorstu',hex($1),1)}

Benchmark comparison of Text::xSV and Text::CSV_XS (New for Windows)
by radiantmatrix (Parson) on Dec 21, 2004 at 22:52 UTC

    Because the original issue was one of the Windows version of CSV_XS, I have composed a Benchmark similar to my orignal. This covers reading only, and uses the ASPN version of CSV_XS (as opposed to compiling it myself):

    #!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all); my @buffer; cmpthese( 200, { 'xSV' => \&xSV, 'CSV' => \&CSV_XS } ); sub xSV { print STDERR '.'; use Text::xSV; my $xsv = new Text::xSV(filename=>"sample.csv", sep=>';'); while (my $row = $xsv->get_row) { @buffer = @$row; #just a "do something" instruction } undef @buffer; print STDERR '.'; } sub CSV_XS { print STDERR '*'; use Text::CSV_XS; use IO::File; my $io = new IO::File "< sample.csv"; my $csv = new Text::CSV_XS({sep_char=>';'}); while (my $row = $csv->getline($io)) { last unless @$row; @buffer = @$row; #just a "do something" instruction } undef @buffer; $io->close; print STDERR '*'; }
    Results for three runs (I've trimmed the status indicators):
    1.       Rate  xSV  CSV
      xSV 2.11/s   -- -75%
      CSV 8.51/s 304%   --
      
    2.       Rate  xSV  CSV
      xSV 2.20/s   -- -74%
      CSV 8.52/s 287%   --
      
    3.       Rate  xSV  CSV
      xSV 2.19/s   -- -74%
      CSV 8.40/s 284%   --
      
    As you can see, Text::CSV_XS is noticably faster than Text::xSV, though the margins are not as large as on my Linux machines. I'm glad that someone pointed out that CSV_XS is now available through ASPN, as it allows me to write fast, portable CSV-handling code now. ;-)

    I guess I don't know how to optimize a compiler for Windows... but I knew that already.

    radiantmatrix
    require General::Disclaimer;
    s//2fde04abe76c036c9074586c1/; while(m/(.)/g){print substr(' ,JPacehklnorstu',hex($1),1)}

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://413409]
Approved by johnnywang
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-04-19 11:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found