Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^4: Rosetta Code: Long List is Long (faster - vec - fast_io)

by marioroy (Prior)
on Jan 10, 2023 at 17:25 UTC ( [id://11149504]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Rosetta Code: Long List is Long (faster - vec - openmp)
in thread Rosetta Code: Long List is Long

I'm not seeing a noticeable difference between the initial llil2vec demonstration and this one w.r.t. total time. So, I searched the web and tried fast_io with success. It requires a C++20 capable compiler.

git clone --depth 1 https://github.com/cppfastio/fast_io

Add the header file above other includes and comment out the cstdio include line.

#include <fast_io.h> //#include <cstdio>

Specify the path to the include folder. Also C++20.

clang++ -o llil2vec -std=c++20 -Wall -O3 llil2vec.cpp -I./fast_io/incl +ude

Before C++11 results:

llil2vec (fixed string length=6) start get_properties CPU time : 1.62356 secs emplace set sort CPU time : 0.451718 secs write stdout CPU time : 0.923496 secs total CPU time : 2.99881 secs total wall clock time : 3 secs

C++20 + fast_io results:

llil2vec (fixed string length=6) start get_properties CPU time : 1.50394 secs emplace set sort CPU time : 0.430272 secs write stdout CPU time : 0.889239 secs total CPU time : 2.82348 secs total wall clock time : 3 secs

Replies are listed 'Best First'.
Re^5: Rosetta Code: Long List is Long (faster - vec - fast_io)
by eyepopslikeamosquito (Archbishop) on Jan 11, 2023 at 11:12 UTC

    Thanks so much for finding this github fast_io library and for posting such clear instructions on how to use it! This made it ridiculously easy for me to play around with it.

    So far I haven't found any statistically significant speed-ups from employing this library (hardly surprising given the mega man years of effort poured into g++ and clang++), but I will keep my eye on it.

      So far I haven't found any statistically significant speed-ups from employing this library ...

      After closer look, it was the -std=c++20 language mode that enables faster vectors by 0.2 ~ 0.4 seconds versus -std=c++11.

      Update 1: Support variable length words.
      Update 2: Enable parallel sort. See Using Parallel Mode.

      I tried OpenMP. Unfortunately, strtok is not thread-safe e.g. strtok(NULL, "\n") causing segfault. So I factored out strtok. The OpenMP result improved by 0.1 seconds. That's because the actual reading is already fast. It takes 2 threads minimally to run faster than non-OpenMP results due to populating vec_rec from local copies.

      Building:

      clang++ -o llil4vec -std=c++11 -Wall -O3 llil4vec.cpp clang++ -o llil4vec -std=c++20 -Wall -O3 llil4vec.cpp # faster # enable parallel via -fopenmp clang++ -o llil4vec-omp -std=c++11 -fopenmp -Wall -O3 llil4vec.cpp clang++ -o llil4vec-omp -std=c++20 -fopenmp -Wall -O3 llil4vec.cpp # +faster

      Running - Real time results:

      $ time ./llil4vec big1.txt big2.txt big3.txt >out.txt std:c++11: 2.901 secs std:c++20: 2.850 secs $ time OMP_NUM_THREADS=3 ./llil4vec-omp big1.txt big2.txt big3.txt >ou +t.txt std:c++11: 2.308 secs std:c++20: 2.267 secs

      llil4vec.cpp modification, OpenMP-aware:

      #if defined(_OPENMP) #include <omp.h> #include <parallel/algorithm> #else #include <algorithm> #endif ... static void get_properties( int nfiles, // in: the number of input files char* fname[], // in: the input file names vec_int_str_type& vec_ret) // out: a vector of properties { #if defined(_OPENMP) omp_set_dynamic(0); omp_set_max_active_levels(1); #pragma omp parallel { vec_int_str_type vec_loc; // thread local copy #pragma omp for nowait schedule(static,1) #endif for (int i = 0; i < nfiles; ++i) { char line[MAX_LINE_LEN_L+1]; char* found; llil_int_type count; FILE* fh = ::fopen(fname[i], "r"); if (fh == NULL) { std::cerr << "Error opening '" << fname[i] << "' : errno=" + << errno << "\n"; continue; } while ( ::fgets(line, MAX_LINE_LEN_L, fh) != NULL ) { found = ::strchr(line, '\t'); count = ::atoll( &line[found - line + 1] ); line[found - line] = '\0'; // word #ifdef MAX_STR_LEN_L str_type fixword { { '\0', '\0', '\0', '\0', '\0', '\0' } +}; ::memcpy( fixword.data(), line, found - line ); #if defined(_OPENMP) vec_loc.emplace_back( -count, fixword ); #else vec_ret.emplace_back( -count, fixword ); #endif #else #if defined(_OPENMP) vec_loc.emplace_back( -count, line ); #else vec_ret.emplace_back( -count, line ); #endif #endif } ::fclose(fh); } #if defined(_OPENMP) #pragma omp critical vec_ret.insert(vec_ret.end(), vec_loc.begin(), vec_loc.end()); } #endif // Needs to be sorted by word for later sum of adjacent count field +s to work #if defined(_OPENMP) __gnu_parallel::sort( vec_ret.begin(), vec_ret.end(), [](const int_str_type& left, const int_str_type& right) { return + left.second < right.second; } ); #else std::sort( vec_ret.begin(), vec_ret.end(), [](const int_str_type& left, const int_str_type& right) { return + left.second < right.second; } ); #endif }

        Helpful, OpenMP Little Book which introduces OpenMP C/C++ concurrency programming.

        OpenMP spawns a thread per logical core by default. Therefore, I set the desired number of threads via the OMP_NUM_THREADS environment variable (... or NUM_THREADS ...) to minimize extra threads creation and reaping.

        $ NUM_THREADS=3 ./llil4vec big1.txt big2.txt big3.txt >out.txt llil4vec (fixed string length=12) start use OpenMP use boost sort get properties 0.131 secs sort properties 0.191 secs vector reduce 0.036 secs vector stable sort 0.267 secs write stdout 0.191 secs total time 0.818 secs

        llil4vec.cc:

        Very interesting! This will help me get started with OpenMP, which I've been meaning to do for a while now.

        Unfortunately, strtok is not thread-safe e.g. strtok(NULL, "\n") causing segfault. So I factored out strtok.

        Ha ha, that I used strtok at all is an indication of how desperate I was, given I singled this function out for a special dishonourable mention at On Interfaces and APIs. :)

        To round out this section, notice that the ANSI C strtok function has a shocker of an interface: it surprises by writing NULLs into the input string being tokenized; the first call has different semantics to subsequent calls (distinguished by special-case logic on the value of its first parameter); and it stores state between calls so that only one sequence of calls can be active at a time (bad enough in a single-threaded environment, but so intolerable in multi-threaded and signal handler (reentrant) environments that the POSIX threads committee invented a replacement strtok_r function).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11149504]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (3)
As of 2024-04-25 07:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found