http://qs321.pair.com?node_id=11134582

Here is my attempt to run Nvidia's cuda code on the GPU via Perl. The motivation is from question Perl GPGPU Modules by kcott (interesting problem, Ken!). The main tool is Inline::C which opens so many doors.

Cuda is a programming language on top (or extending) C which deals with GPGPU. General-purpose computing on graphics processing units (GPGPU) tries to use graphics cards (GPU) and their highly parallel architecture to run tasks, like (large) matrix multiplication, which a CPU, because of its architecture, runs much slower and inefficiently. The GPU is designed for matrix multiplications and that's what it does frame after frame of what we see on our monitor without sweat. Matrix multiplication is the basis for a lot of numerical applications and can make social planning much easier.

First the problems:

Here is the general setup:

Use Inline::C, which is a great and powerful module! thanks!, with specific compiler and linker by providing it (via use Inline C => Config => cc => '...', ld => '...') with two Perl scripts namely nvcc-compile.pl and nvcc-link.pl These will remove some incompatible compile/link flags which Inline::C and ExtUtils::MakeMaker use for compiling plain C code. They will also prefix others with -Xcompile ... to pass them on to the system compiler. The first script will also rename one of the temporary files produced in _Inline/build/ directory so that its extension is .cu and not .c. Then compiler and linker scripts proceed in running the actual nvcc command appropriate for compiling or linking. These scripts just worked for me but will probably need tweaking for other compilers and other flags. At least a general setup is in place.

The two scripts to be provided to Inline::C as compiler and linker are given at the end. Edit: Save them at the same location as the demo script below without changing their names.

Here is a basic Perl script running a cuda program on the GPU:

#!/usr/bin/perl # by bliako @ PerlMonks.org # date: 01-Jul-2021 # see https://perlmonks.org/?node_id=11134582 # lame example for utilising GPGPU via Inline::C # TODO: extend to taking params and returning back results use strict; use warnings; use FindBin; use Inline C => Config => cc => $FindBin::Bin.'/nvcc-compile.pl', ld => $FindBin::Bin.'/nvcc-link.pl', ; use Inline C => <<'EOC'; // from https://developer.nvidia.com/blog/easy-introduction-cuda-c-and +-c/ #include <stdio.h> __global__ void saxpy(int n, float a, float *x, float *y) { int i = blockIdx.x*blockDim.x + threadIdx.x; if (i < n) y[i] = a*x[i] + y[i]; } int main() { int N = 1<<20; float *x, *y, *d_x, *d_y; x = (float*)malloc(N*sizeof(float)); y = (float*)malloc(N*sizeof(float)); cudaMalloc(&d_x, N*sizeof(float)); cudaMalloc(&d_y, N*sizeof(float)); for (int i = 0; i < N; i++) { x[i] = 1.0f; y[i] = 2.0f; } cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice); cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice); // Perform SAXPY on 1M elements saxpy<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y); cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost); float maxError = 0.0f; for (int i = 0; i < N; i++) maxError = max(maxError, abs(y[i]-4.0f)); printf("Max error: %f\n", maxError); cudaFree(d_x); cudaFree(d_y); free(x); free(y); return 0; // << late edit! } EOC main;

nvcc-compile.pl

#!/usr/bin/perl # nvcc-compile.pl # by bliako @ PerlMonks.org # date: 01-Jul-2021 # see https://perlmonks.org/?node_id=11134582 # tools for running cuda code on the GPU via Perl and Inline::C # script to be provided to Inline::C as its 'cc' parameter, like # use Inline C => Config => # cc => $FindBin::Bin.'/nvcc-compile.pl', # ld => $FindBin::Bin.'/nvcc-link.pl', #; # Below, set $EXE and $CC to point to nvcc and gcc commands # Note that nvcc requires specific gcc versions ONLY # WARNING: if you make changes here, it is unlikely that Inline::C # will notice. It's better to delete the temp _Inline directory and st +art afresh use strict; use warnings; use Cwd; my $verbose = 0; my $EXE = '/usr/local/cuda/bin/nvcc'; my $CC = '/usr/local/gcc84/bin/gcc84'; ######################### # nothing to change below ######################### my $PWD = Cwd::cwd; my @remove = ( qr/\-Werror=format\-security(?=\s|$)/, qr/\-m64(?=\s|$)/, qr/\-mtune=generic(?=\s|$)/, qr/\-iquote[^ ]*(?=\s|$)/, qr/\-grecord\-gcc\-switches(?=\s|$)/, qr/\-pipe(?=\s|$)/, qr/\-Wall(?=\s|$)/, qr/\-Wp,\-D_FORTIFY_SOURCE=[0-9]+(?=\s|$)/, qr/\-Wp,\-D_GLIBCXX_ASSERTIONS(?=\s|$)/, qr/\-specs=[^ ]+(?=\s|$)/, qr/\-DVERSION=[^ ]+(?=\s|$)/, qr/\-DXS_VERSION=[^ ]+(?=\s|$)/, ); my @replace_compiler_options = ( qr/(\-flto=auto)(?=\s|$)/, qr/(\-ffat\-lto\-objects)(?=\s|$)/, qr/(\-fexceptions)(?=\s|$)/, qr/(\-fstack\-protector\-strong)(?=\s|$)/, qr/(\-fasynchronous\-unwind\-tables)(?=\s|$)/, qr/(\-fstack\-clash\-protection)(?=\s|$)/, qr/(\-fcf\-protection)(?=\s|$)/, qr/(\-fwrapv)(?=\s|$)/, qr/(\-fPIC)(?=\s|$)/, qr/(\-fno\-strict\-aliasing)(?=\s|$)/, qr/(\-Wl,\-\-as-needed)(?=\s|$)/, qr/(\-Wl,\-z,now)(?=\s|$)/, ); my @newarg; for my $anarg (@ARGV){ print "processing '$anarg'\n" if $verbose; for my $q (@remove){ if( $anarg =~ s/$q//g ){ print "removing $q...\n" if $verbose +} } for my $q (@replace_compiler_options){ if( $anarg =~ s/$q/-Xcompiler \\"$1\\"/g ){ print "replacing $ +q...\n" if $verbose } } if( $anarg !~ /^\s*$/ ){ push @newarg, $anarg } } # hack to change the file extension from .c to .cu # assumes that the file to compile is the last in @ARGV(!!!) my $cfile = $newarg[-1]; my $cufile = $cfile; $cufile =~ s/.c$/.cu/; $newarg[-1] = $cufile; my $cmdstr = "cp '$cfile' '$cufile'"; die "failed" if mysystem($cmdstr); $cmdstr = $EXE." --compiler-bindir /usr/local/gcc84/bin/gcc84 ".join(" + ", @newarg); print "$0 : executing:\n$cmdstr\n"; die "failed" if mysystem($cmdstr); #system($EXE, @newarg); sub mysystem { my @args = @_; system(@args); if ($? == -1) { print STDERR "failed to execute: $!\n"; return 1; } elsif ($? & 127) { printf STDERR "child died with signal %d, %s coredump\n", ($? & 127), ($? & 128) ? 'with' : 'without'; return 1; } my $ex = $? >> 8; if( $ex ){ print STDERR "error, system command failed with exit code $ex" +; return 1; } printf "success, system command executed.\n"; return 0; }

nvcc-link.pl

#!/usr/bin/perl # nvcc-link.pl # by bliako @ PerlMonks.org # date: 01-Jul-2021 # see https://perlmonks.org/?node_id=11134582 # tools for running cuda code on the GPU via Perl and Inline::C # script to be provided to Inline::C as its 'ld' parameter, like # use Inline C => Config => # cc => $FindBin::Bin.'/nvcc-compile.pl', # ld => $FindBin::Bin.'/nvcc-link.pl', #; # Below, set $EXE and $CC to point to nvcc and gcc commands # Note that nvcc requires specific gcc versions ONLY # WARNING: if you make changes here, it is unlikely that Inline::C # will notice. It's better to delete the temp _Inline directory and st +art afresh use strict; use warnings; use Cwd; my $verbose = 0; my $EXE = '/usr/local/cuda/bin/nvcc'; my $CC = '/usr/local/gcc84/bin/gcc84'; ########################### # nothing tho change below ########################### my $PWD = Cwd::cwd; my @remove = ( qr/\-Werror=format\-security(?=\s|$)/, qr/\-m64(?=\s|$)/, qr/\-mtune=generic(?=\s|$)/, qr/\-iquote[^ ]*(?=\s|$)/, qr/\-grecord\-gcc\-switches(?=\s|$)/, qr/\-pipe(?=\s|$)/, qr/\-Wall(?=\s|$)/, qr/\-Wp,\-D_FORTIFY_SOURCE=[0-9]+(?=\s|$)/, qr/\-Wp,\-D_GLIBCXX_ASSERTIONS(?=\s|$)/, qr/\-specs=[^ ]+(?=\s|$)/, qr/\-DVERSION=[^ ]+(?=\s|$)/, qr/\-DXS_VERSION=[^ ]+(?=\s|$)/, ); my @replace_compiler_options = ( qr/(\-flto=auto)(?=\s|$)/, qr/(\-ffat\-lto\-objects)(?=\s|$)/, qr/(\-fexceptions)(?=\s|$)/, qr/(\-fstack\-protector\-strong)(?=\s|$)/, qr/(\-fasynchronous\-unwind\-tables)(?=\s|$)/, qr/(\-fstack\-clash\-protection)(?=\s|$)/, qr/(\-fcf\-protection)(?=\s|$)/, qr/(\-fwrapv)(?=\s|$)/, qr/(\-fPIC)(?=\s|$)/, qr/(\-fno\-strict\-aliasing)(?=\s|$)/, qr/(\-Wl,\-z,relro)(?=\s|$)/, qr/(\-Wl,\-\-as-needed)(?=\s|$)/, qr/(\-Wl,\-z,now)(?=\s|$)/, ); my @newarg; for my $anarg (@ARGV){ print "processing '$anarg'\n" if $verbose; for my $q (@remove){ if( $anarg =~ s/$q//g ){ print "removing $q...\n" if $verbose +} } for my $q (@replace_compiler_options){ if( $anarg =~ s/$q/-Xcompiler \\"$1\\"/g ){ print "replacing $ +q...\n" if $verbose } } if( $anarg !~ /^\s*$/ ){ push @newarg, $anarg } } my $cmdstr = $EXE." --compiler-bindir ${CC} ".join(" ", @newarg); print "$0 : executing:\n$cmdstr\n"; die "failed" if mysystem($cmdstr); sub mysystem { my @args = @_; system(@args); if ($? == -1) { print STDERR "failed to execute: $!\n"; return 1; } elsif ($? & 127) { printf STDERR "child died with signal %d, %s coredump\n", ($? & 127), ($? & 128) ? 'with' : 'without'; return 1; } my $ex = $? >> 8; if( $ex ){ print STDERR "error, system command failed with exit code $ex" +; return 1; } printf "success, system command executed.\n"; return 0; }

At the moment, I have not implemented communicating parameters to and from the inlined cuda code. Feel free to extend.

Suggestions: Inline::C can be modified in order to avoid my ugly hacks, or a new Inline::Cuda can be built.

These are interesting times. This is a small step in making them fun-ner and lazy-er too. A big Thank You to the author of Inline::C and Nvidia.

Tested on Linux with (older) gcc version 8.4, Nvidia's Cuda compilation tools version 11.4.48, Nvidia graphics driver 470.42.01, Perl version 5.32.1, Inline::C version 0.81

Edits: main demo script added a return 0; at the end of main()

bw, bliako