Re^4: Rosetta Code: Long List is Long (faster - vec

I'm not seeing a noticeable difference between the initial llil2vec demonstration and this one w.r.t. total time. So, I searched the web and tried fast_io with success. It requires a C++20 capable compiler.

git clone --depth 1 https://github.com/cppfastio/fast_io
[download]

Add the header file above other includes and comment out the cstdio include line.

#include <fast_io.h>

//#include <cstdio>
[download]

Specify the path to the include folder. Also C++20.

clang++ -o llil2vec -std=c++20 -Wall -O3 llil2vec.cpp -I./fast_io/incl
+ude
[download]

Before C++11 results:

llil2vec (fixed string length=6) start
get_properties      CPU time : 1.62356 secs
emplace set sort    CPU time : 0.451718 secs
write stdout        CPU time : 0.923496 secs
total               CPU time : 2.99881 secs
total        wall clock time : 3 secs
[download]

C++20 + fast_io results:

llil2vec (fixed string length=6) start
get_properties      CPU time : 1.50394 secs
emplace set sort    CPU time : 0.430272 secs
write stdout        CPU time : 0.889239 secs
total               CPU time : 2.82348 secs
total        wall clock time : 3 secs
[download]

Comment on Re^4: Rosetta Code: Long List is Long (faster - vec - fast_io) Select or Download Code

Replies are listed 'Best First'.
Re^5: Rosetta Code: Long List is Long (faster - vec - fast_io) by eyepopslikeamosquito (Archbishop) on Jan 11, 2023 at 11:12 UTC
Thanks so much for finding this github `fast_io` library and for posting such clear instructions on how to use it! This made it ridiculously easy for me to play around with it. So far I haven't found any statistically significant speed-ups from employing this library (hardly surprising given the mega man years of effort poured into `g++` and `clang++`), but I will keep my eye on it.	[reply] [d/l] [select]
Re^6: Rosetta Code: Long List is Long (faster - vec - OpenMP) by marioroy (Prior) on Jan 11, 2023 at 16:51 UTC
So far I haven't found any statistically significant speed-ups from employing this library ... After closer look, it was the -std=c++20 language mode that enables faster vectors by 0.2 ~ 0.4 seconds versus -std=c++11. Update 1: Support variable length words. Update 2: Enable parallel sort. See Using Parallel Mode. I tried OpenMP. Unfortunately, strtok is not thread-safe e.g. strtok(NULL, "\n") causing segfault. So I factored out strtok. The OpenMP result improved by 0.1 seconds. That's because the actual reading is already fast. It takes 2 threads minimally to run faster than non-OpenMP results due to populating vec_rec from local copies. Building: `clang++ -o llil4vec -std=c++11 -Wall -O3 llil4vec.cpp clang++ -o llil4vec -std=c++20 -Wall -O3 llil4vec.cpp # faster # enable parallel via -fopenmp clang++ -o llil4vec-omp -std=c++11 -fopenmp -Wall -O3 llil4vec.cpp clang++ -o llil4vec-omp -std=c++20 -fopenmp -Wall -O3 llil4vec.cpp # +faster` [download] Running - Real time results: `$ time ./llil4vec big1.txt big2.txt big3.txt >out.txt std:c++11: 2.901 secs std:c++20: 2.850 secs $ time OMP_NUM_THREADS=3 ./llil4vec-omp big1.txt big2.txt big3.txt >ou +t.txt std:c++11: 2.308 secs std:c++20: 2.267 secs` [download] llil4vec.cpp modification, OpenMP-aware: #if defined(_OPENMP) #include <omp.h> #include <parallel/algorithm> #else #include <algorithm> #endif ... static void get_properties( int nfiles, // in: the number of input files char* fname[], // in: the input file names vec_int_str_type& vec_ret) // out: a vector of properties { #if defined(_OPENMP) omp_set_dynamic(0); omp_set_max_active_levels(1); #pragma omp parallel { vec_int_str_type vec_loc; // thread local copy #pragma omp for nowait schedule(static,1) #endif for (int i = 0; i < nfiles; ++i) { char line[MAX_LINE_LEN_L+1]; char* found; llil_int_type count; FILE* fh = ::fopen(fname[i], "r"); if (fh == NULL) { std::cerr << "Error opening '" << fname[i] << "' : errno=" + << errno << "\n"; continue; } while ( ::fgets(line, MAX_LINE_LEN_L, fh) != NULL ) { found = ::strchr(line, '\t'); count = ::atoll( &line[found - line + 1] ); line[found - line] = '\0'; // word #ifdef MAX_STR_LEN_L str_type fixword { { '\0', '\0', '\0', '\0', '\0', '\0' } +}; ::memcpy( fixword.data(), line, found - line ); #if defined(_OPENMP) vec_loc.emplace_back( -count, fixword ); #else vec_ret.emplace_back( -count, fixword ); #endif #else #if defined(_OPENMP) vec_loc.emplace_back( -count, line ); #else vec_ret.emplace_back( -count, line ); #endif #endif } ::fclose(fh); } #if defined(_OPENMP) #pragma omp critical vec_ret.insert(vec_ret.end(), vec_loc.begin(), vec_loc.end()); } #endif // Needs to be sorted by word for later sum of adjacent count field +s to work #if defined(_OPENMP) __gnu_parallel::sort( vec_ret.begin(), vec_ret.end(), [](const int_str_type& left, const int_str_type& right) { return + left.second < right.second; } ); #else std::sort( vec_ret.begin(), vec_ret.end(), [](const int_str_type& left, const int_str_type& right) { return + left.second < right.second; } ); #endif } [download]	[reply] [d/l] [select]
Re^7: Rosetta Code: Long List is Long (faster - llil4vec - OpenMP code) by marioroy (Prior) on Jan 12, 2023 at 09:55 UTC
Helpful, OpenMP Little Book which introduces OpenMP C/C++ concurrency programming. OpenMP spawns a thread per logical core by default. Therefore, I set the desired number of threads via the OMP_NUM_THREADS environment variable (... or NUM_THREADS ...) to minimize extra threads creation and reaping. `$ NUM_THREADS=3 ./llil4vec big1.txt big2.txt big3.txt >out.txt llil4vec (fixed string length=12) start use OpenMP use boost sort get properties 0.131 secs sort properties 0.191 secs vector reduce 0.036 secs vector stable sort 0.267 secs write stdout 0.191 secs total time 0.818 secs` [download] llil4vec.cc: Read more... (17 kB)	[reply] [d/l] [select]
Re^8: Rosetta Code: Long List is Long (faster - llil4vec - OpenMP code) by marioroy (Prior) on Jan 12, 2023 at 11:59 UTC
Re^8: Rosetta Code: Long List is Long (faster - vec - OpenMP code - tbb) by eyepopslikeamosquito (Archbishop) on Jan 17, 2023 at 03:23 UTC
Re^9: Rosetta Code: Long List is Long (faster - vec - OpenMP code - tbb) by marioroy (Prior) on Jan 17, 2023 at 10:33 UTC
Re^9: Rosetta Code: Long List is Long (faster - llil4vec - TBB code) by marioroy (Prior) on Jan 19, 2023 at 15:42 UTC
Some notes below your chosen depth have not been shown here
Re^7: Rosetta Code: Long List is Long (faster - vec - OpenMP) by eyepopslikeamosquito (Archbishop) on Jan 11, 2023 at 20:59 UTC
Very interesting! This will help me get started with OpenMP, which I've been meaning to do for a while now. Unfortunately, `strtok` is not thread-safe e.g. `strtok(NULL, "\n")` causing segfault. So I factored out `strtok`. Ha ha, that I used `strtok` at all is an indication of how desperate I was, given I singled this function out for a special dishonourable mention at On Interfaces and APIs. :) To round out this section, notice that the ANSI C strtok function has a shocker of an interface: it surprises by writing NULLs into the input string being tokenized; the first call has different semantics to subsequent calls (distinguished by special-case logic on the value of its first parameter); and it stores state between calls so that only one sequence of calls can be active at a time (bad enough in a single-threaded environment, but so intolerable in *multi-threaded and signal handler* (reentrant) environments that the POSIX threads committee invented a replacement `strtok_r` function).	[reply] [d/l] [select]
Re^8: Rosetta Code: Long List is Long (faster - vec - OpenMP) by marioroy (Prior) on Jan 11, 2023 at 22:07 UTC


XP is just a number
	PerlMonks

Re^4: Rosetta Code: Long List is Long (faster - vec - fast_io)