Hi fellow monks,
I'm using PDL to work on a dataset that's too large to fit in memory. So, I'm using the function PDL::IO::Fastraw::mapfraw to memory map the file.
Which is all fine and dandy, but I'd like my processing to go quicker. I'm not altering this dataset at all; which is why I'm confused to see it (apparently) being written back to disk when a page is swapped out.
So my question is:
Given that I'm not altering the data and so it doesn't need to be written back to disk, how can I avoid this happening and so speed things up?
The code in Fastraw that's doing the memory mapping is:
sub PDL::mapfraw {
my $class = shift;
my($name,$opts) = @_;
my $hdr;
if($opts->{Dims}) {
my $datatype = $opts->{Datatype};
if(!defined $datatype) {$datatype = $PDL_D;}
$hdr->{Type} = $datatype;
$hdr->{Dims} = $opts->{Dims};
$hdr->{NDims} = scalar(@{$opts->{Dims}});
} else {
$hdr = _read_frawhdr($name);
}
$s = PDL::Core::howbig($hdr->{Type});
for(@{$hdr->{Dims}}) {
$s *= $_;
}
my $pdl = $class->zeroes(new PDL::Type($hdr->{Type}));
# $pdl->dump();
$pdl->setdims($hdr->{Dims});
# $pdl->dump();
$pdl->set_data_by_mmap($name,$s,1,($opts->{ReadOnly}?0:1),
($opts->{Creat}?1:0),
(0644),
($opts->{Creat} || $opts->{Trunc} ? 1:0));
# $pdl->dump();
if($opts->{Creat}) {
_writefrawhdr($pdl,$name);
}
return $pdl;
}
(written by Karl Glazebrook, the author of the module, not by me)
This calls on the C routine set_data_by_mmap in PDL/Basic/Core/Core.xs.PL, where what seems to be the relevant part looks like this:
set_data_by_mmap(it,fname,len,writable,shared,creat,mode,trunc)
pdl *it
char *fname
int len
int writable
int shared
int creat
int mode
int trunc
CODE:
#ifdef USE_MMAP
int fd;
pdl_freedata(it);
fd = open(fname,(writable && shared ? O_RDWR : O_RDONLY)|
(creat ? O_CREAT : 0),mode);
if(fd < 0) {
croak("Error opening file");
}
if(trunc) {
ftruncate(fd,0); /* Clear all previous data */
ftruncate(fd,len); /* And make it long enough */
}
if(len) {
it->data = mmap(0,len,PROT_READ | (writable ?
PROT_WRITE : 0),
(shared ? MAP_SHARED : MAP_PRIVATE),
fd,0);
if(!it->data)
croak("Error mmapping!");
} else {
/* Special case: zero-length file */
it->data = NULL;
}
PDLDEBUG_f(printf("PDL::MMap: mapped to %d\n",it->data);)
it->state |= PDL_DONTTOUCHDATA | PDL_ALLOCATED;
pdl_add_deletedata_magic(it, pdl_delete_mmapped_data, len);
close(fd);
#else
(again not written by me, but by the authors of PDL).
I'm on Mac OS X, 10.4.9. Perl 5.8.6 built for darwin-thread-multi-2level (apparently). All help with this one gratefully received.
Best wishes, andye
PS: I have tried setting ReadOnly, it didn't help.