Speeds vs functionality

Replies are listed 'Best First'.
Re: Speeds vs functionality by BrowserUk (Patriarch) on Jul 27, 2014 at 18:03 UTC
CPU's may still be getting marginally faster; but typical data files are increasing in size at a much faster rate. My personal take is that seldom used and one-off requested features should not impact the performance unless those features are being used! This is easily accomplished by having the minimal/most-used functionality -- the fast(est) version of a primary function or method -- as the default; and then have the added-functionality version override that if and when it is required. Eg. `sub function_fast { ... } sub function_full { ... function_fast( ... ); ## If possible, else copy-paste common code d +espite the DRY principle... ... } sub function; sub function_init { my %args = @_; validate_args( \%args ); if( added_functionality_required ) { function = \&function_fast; } else { function = \*function_full; } }` [download] Poor example code, but hopefully explains my drift. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: Speeds vs functionality by AppleFritter (Vicar) on Jul 27, 2014 at 17:40 UTC
For me, it'd depend on several factors. How critical would speed be in this module? Would it just be something that's nice to have, or would it be crucial to the module's operation? They say that optimization should always start with profiling: figure out where you actually spend time. Same thing here; would the module likely be a bottleneck in an application using it? In the same vein: how much of an absolute slowdown would there be? "The module runs twice as slow as before with this feature added" sounds bad, but if we'd be talking about an increase in running time from (say) 1ms to 2ms in code that'd only be used once in an app, that'd be much less of an issue than a larger (absolute) slowdown, or a slowdown in code that is expected to be called in a tight loop. I'd also consider what the module is actually doing. To give just one example, graphical UIs need to be snappy to feel natural and usable, so I'd be wary of introducing delays in UI code without a good reason; but other tasks are understood and expected to take some time (though that doesn't mean it's always proper make them take more time, even in exchange for extra features). Then of course there's the question of whether said new feature would need to cause a slowdown at all. Perhaps there'd be an alternate implementation that would run faster. It might be less elegant perhaps, less natural, longer and more complicated -- but why care how sausages are made when you only want to eat them? If neither decision could be justified at all, I might simply split the module into two versions: a fast, lean one that provides only the necessary features and squeezes out the last bits of speed, and a full-featured one that makes a few sacrifices for the sake of convenience. TL;DR: like so many things in life, it depends.	[reply]
Re^2: Speeds vs functionality by Tux (Canon) on Jul 27, 2014 at 17:56 UTC
Good answers. I'll come with more accurate figures when I see more posts. I was "vague" on purpose. Yes, speed matters, a lot Speed loss is less than 50% No alternative possibility (in this case) No split (though I thought about that) Avarage time between releases: 1-2 months Enjoy, Have FUN! H.Merijn	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Speeds vs functionality by salva (Canon) on Jul 28, 2014 at 08:51 UTC
How important is that feature? How many people are going to use it? Can you already attain the same result in other ways with your module in its current state even if it is more difficult/complex? Is that just a feature or are you actually improving the correctness of the module? Is your module commonly used on speed critical sections? Anyway, the response is obviously 42%.	[reply]
Re^2: Speeds vs functionality by Tux (Canon) on Jul 28, 2014 at 09:42 UTC
How important is that feature? I think important, but only the future will tell How many people are going to use it? All (unavoidable) Can you already attain the same result in other ways with your module in its current state even if it is more difficult/complex? No Is that just a feature or are you actually improving the correctness of the module? A very good question. The new feature is actually widening the possible uses of the module. That means that it is backward compatible, but will allow new and more difficult formats when released. Is your module commonly used on speed critical sections? Yes Anyway, the response is obviously 42%. In that case I am safe :) Enjoy, Have FUN! H.Merijn	[reply]
Re: Speeds vs functionality by zentara (Archbishop) on Jul 28, 2014 at 09:43 UTC
Well, in terms of software in general, I would have to say that the Desktop which everyone uses is a great example of this. Do you really 1 gig of software being loaded, taking 4 seconds to initialize, just to use an X11 based or Windows based program? Why do people use KDE or Gnome or Windows, when a simpler faster desktop is available to them? I am super happy with the ICEWM for it's speed, simplicity, and just plain staying out of my way. The new Lubuntu, with it's speedy desktop is great too. Excessive menus, and background daemons slow things down. There are times, when I find running a Java based mega-program is neccesary, and the sluggishness is apparent to the casual observer. In that case, functionality is preferred over speed. You can be cheap, or accurate, but not both; unless you are lucky, ... or use Perl; :-) I'm not really a human, but I play one on earth. Old Perl Programmer Haiku ................... flash japh	[reply]
Re: Speeds vs functionality by Tux (Canon) on Jul 29, 2014 at 10:42 UTC
OK, more specific The module is Text::CSV_XS (surprise), and the new feature is support for multi-byte `sep_char` which includes the support of UTF-8 separation characters. The fact that it is the separation character that can now allow a character instead of a single byte implies a huge impact. The first check on every byte in a CSV stream is the check on the separation character. Every extra test on that byte will cause that extra test to be executed for every single byte in the stream. This is still the fastest way. Making that check conditional on the state of the stream will just cause another (or more) test to be executed instead. The performance drop for the fastest stream test I do is measured between 5 and 10%. For all versions of perl I tested with. At this moment, I think it is worth it, but I am still in doubt. $ perl -MCSV -C3 -E'csv (out => STDOUT, in => [[ 1, 2 ]], sep => "\x{060c}")' 1،2 $ perl -MCSV -C3 -E'csv (out => STDOUT, in => [[ 1, 2 ]], sep => "\N{FULLWIDTH COMMA}")' 1，2 $ perl -MCSV -C3 -E'csv (out => STDOUT, in => [[ 1, 2 ]], sep => "\N{FULLWIDTH COMMA}")' \| \ perl -MCSV -E'DDumper (csv (in => STDIN, sep => "\x{ff0c}"))' [ [ 1, 2 ] ] $ Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re^2: Speeds vs functionality by salva (Canon) on Jul 29, 2014 at 11:26 UTC
I have just looked over the code (this one, right?) and it seems to me that a better approach can be used to check for separators. Currently you check at every character for the two possibilities (single or multi-byte separator): `if (c == csv->sep_char \|\| is_SEPX (c)) {` [download] A better way would be to consider the multi-byte separator as a single-byte separator plus a tail: `/* somewhere on the object constructor / csv->sep_tail_len = sep_len - 1; csv->sep_tail = sep + 1; csv->sep_char = sep; ... /* then, on the parser / if (c == csv->sep_char) { if (!csv->sep_tail_len \|\| ((csv->size - csv->used >= csv->sep_tail_len) && !memcmp(csv->bptr + csv->used, csv->sep_tail, csv->sep_tail_l +en))) { / you have a separator! */` [download] I think that would minimize the impact of supporting the extra multi-byte checks on the common single-byte separator case.	[reply] [d/l] [select]
Re^2: Speeds vs functionality by BrowserUk (Patriarch) on Jul 29, 2014 at 12:52 UTC
The first check on every byte in a CSV stream is the check on the separation character. Every extra test on that byte will cause that extra test to be executed for every single byte in the stream. Is it really so difficult to lift the single/multi-byte test out of the loop? Even if it means that everything inside the loop is duplicated, that needn't imply a maintenance problem. You could, for example, make the body of the (now two) loops an inlined function. They've been a part of the standard for 15 years and gcc had them long before that. If you really feel the need to support compilers that don't, you could always substitute (another)of those aweful multiline macros. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: Speeds vs functionality by Tux (Canon) on Jul 29, 2014 at 13:14 UTC
For speed it is just one single loop. The test for the separation character occurs - besides the check for every next byte - 5 extra times when looking ahead, e.g. after an escape character or a quotation character. Splitting the test out of the loop currently is difficult. The code is littered with multi-line macros, and I do not think they are awful at all. They work also on all old compilers, and as I am the maintainer, there is no one else that will see them. When digging through perl5 core code, one gets used to multi-line macros. It doesn't bother me. I will have another look at the approach salva suggested and see if I can improve speed there. Having also $paid work, that will not finish this week though. FWIW all feedback here warmly welcomed and appreciated, even if I might not agree on some Enjoy, Have FUN! H.Merijn	[reply]
Re^4: Speeds vs functionality by BrowserUk (Patriarch) on Jul 29, 2014 at 13:30 UTC
Re^2: Speeds vs functionality by duelafn (Parson) on Jul 29, 2014 at 11:25 UTC
For what it's worth, I feel it is worth it. I am a Text::CSV_XS user, I do use it in large data (but not performance critical) situations, I doubt I would ever need that functionality, but consider the option to be worth the (reasonably small) performance penalty. Good Day, Dean	[reply]
Re^2: Speeds vs functionality by Jim (Curate) on Jul 30, 2014 at 06:58 UTC
*How much speed are you willing to sacrifice for a new feature?* As much as it takes, �cause I need the damn feature! ☺ IMHO, a modern CSV parser must be able to parse Unicode text encoded in any Unicode character encoding scheme and with any arbitrary Unicode characters (code points, or even <gulp> extended grapheme clusters) used for CSV metacharacters. And it must properly handle the Unicode byte order mark as prescribed by the Unicode Standard. I'm the monk responsible for these related posts and threads: Best Way To Parse Concordance DAT File Using Modern Perl? Peculiar Reference To U+00FE In Text::CSV_XS Documentation (especially this final post in the thread) Re: Text::CSV and Unicode In the case of the Concordance DAT file, the `sep_char` separator character, U+0014, is encoded in one byte in UTF-8: `"\x14"`. It's the `quote_char` quote character (and consequently also the `escape_char` quote escape character), U+00FE, that happens to be encoded in two bytes in UTF-8: `"\xC3\xBE"`. Jim	[reply] [d/l] [select]
Re^3: Speeds vs functionality by Tux (Canon) on Jul 30, 2014 at 12:13 UTC
If I am satisfied with `sep`, `quote_char` and maybe even `escape_char` will be the next to deal with. Even with the rewrite as suggested by salva (more or less) I still see a slowdown of close to 10%. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^4: Speeds vs functionality by salva (Canon) on Jul 31, 2014 at 08:28 UTC
Re^5: Speeds vs functionality by Tux (Canon) on Jul 31, 2014 at 08:41 UTC
Some notes below your chosen depth have not been shown here
Re^2: Speeds vs functionality by oiskuu (Hermit) on Jul 30, 2014 at 08:52 UTC
Looking at cx_Parse + stuff, some thoughts and questions arise: Is it necessary to fiddle with the cache? Just undef _cache on perl side to enforce parameter changes; this ought to be a rare event? Stashing an opaque C struct avoids needless copying. Then what about unicode whitespace characters? quote_char, escape_char, etc., could be ints and default to -1 when undefined. Easier to test against. However ... Have you tried writing this thing as an fsm? enum { TXT, BIN, WSPACE, UTF8, QUOT, ESC, SEP, CR, NL_EOLX, ..., NN }; enum { EXFIELD, INFIELD = 1NN, INQUOT = 2NN, CRSEEN = 3NN } state; while ((c = getc()) != EOF) { int ctype = cached->xlat[c]; if () ... / perhaps peel the most likely case(s) / switch (state + ctype) { case WSPACE: continue; / nop: allow_whitespace test in xlat[] / case BIN: error(); / again, resolved when constructing xlat[] */ case TXT: state = INFIELD; putc(c); continue; case INFIELD+TXT: case INQUOT+TXT: case INQUOT+SEP: ... putc(c); ... case UTF8: case INFIELD+UTF8: ...accumulate/xlat... case CRSEEN+NL_EOLX: ...; state = 0; continue; case CRSEEN+...: error(); default: error(); } ... [download] Or possibly: `enum { EXFIELD, INFIELD = 0x100, INQUOT = 0x200, CRS = 0x300 } sta +te; ... int action = cached->xlat[state + c]; decode(action); ...` [download] Ultimately, the (handful of) UTF sequences may also be resolved by walking trie-like state tables.	[reply] [d/l] [select]
Re^3: Speeds vs functionality by Tux (Canon) on Jul 30, 2014 at 12:21 UTC
The cache, as implemented currently, was implemented to achief a boost of (iirc) about 25%. It is needed to reduce the access to the object (the `$self` hash), as those lookups are very very expensive. Unicode whitespace isn't important for this parser, as it is no special "character", unless it being the separator, the quotation or the escape character. Unicode whitespace will just end up being binary. XS is not PP :) Those characters could be int indeed, but that would probably mean that the whole parser (written in 1998 and modified/extended over time) has to be rewritten. It /might/ be worth the effort in the end, but I do not have the time to start that experiment. Never tried fsm (unless the current state-machine already is an FSM). I simplified the parser as I got it when I took over maint. Over time a lot of bugs were fixed and new (required and requested) features where added. update: added remark about FSM Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re^4: Speeds vs functionality by salva (Canon) on Jul 31, 2014 at 09:11 UTC
Re^5: Speeds vs functionality by Tux (Canon) on Jul 31, 2014 at 09:38 UTC
Re^4: Speeds vs functionality by Jim (Curate) on Jul 31, 2014 at 00:09 UTC
Re^5: Speeds vs functionality by farang (Chaplain) on Jul 31, 2014 at 23:50 UTC
Some notes below your chosen depth have not been shown here
Re^5: Speeds vs functionality (utf8 csv) by tye (Sage) on Jul 31, 2014 at 03:09 UTC
Some notes below your chosen depth have not been shown here
Re^5: Speeds vs functionality by Anonymous Monk on Jul 31, 2014 at 03:06 UTC
Re: Speeds vs functionality by LanX (Saint) on Jul 27, 2014 at 17:13 UTC
> "How much speed are you willing to sacrifice for a new feature?". My 2 cents:� That depends on the age and maintenance cycles of the module. Considering Moore's law a drop of 50% within a time span of 18 months should be ignored (i.e. Still faster cause only half of the hardware gain lost) Keep in mind that Ruby started to be popular with only half of Perl's speed! Many of us still follow requirements which are decades old... :) Cheers Rolf (addicted to the Perl Programming Language and ☆☆☆☆ :) update �) in all generality... I didn't try to figure out which module was meant.	[reply]
Re: Speeds vs functionality by 1s44c (Scribe) on Aug 02, 2014 at 21:32 UTC
How much speed are you willing to sacrifice for a new feature? If it's a feature I don't want none. If it's a feature I do want as little as possible. I guess it's a balancing act between the number of people who want the feature, how much they want it, and what everyone who doesn't want it will accept before forking the module.	[reply]
Re^2: Speeds vs functionality by Tux (Canon) on Aug 04, 2014 at 09:31 UTC
None sounds a bit selfish. Understandable, but not realistic. No single product (or module) in the public domain is written for just one single user. If it were, it would not be on CPAN. There are no two users that are identical, not even me myself and I. As a module author I somehow expect that the author of CPAN modules do try to minimize performance penalties. Always. So "as little as possible" is - to me - an "of course". The balance I am finding is my prediction in how people/user will need the new feature in the future. My reading of the responses, conversations with other developers and reading comments on related OpenSource projects have made me to decide this is the only way forward. A performance hit of less than 5% is acceptable if I open the usability to a new group that up till now was forced to use slower parsers. The new code performs quite well. The cache code has been simplified and only got me marginal changes: � 1%, which I interpret as noise. I've been testing with 5.6.1 through 5.21.1 over this weekend and all looks pretty well. Only perl built with strict clang currently fails. I will need to address this before I release. Thank all for the valuable feedback. Enjoy, Have FUN! H.Merijn	[reply]


more useful options
	PerlMonks

Speeds vs functionality

update