Having finally found a working Devel::-module from CPAN (Devel::Size) I'm trying to figure out why my memory usage goes through the roof and into the swapfile when reading large chunks of data.
The data concerned can vary a lot, but in this testcase it's a recordset with 119 fields per record and 47039 records in the set.
Performing a simple my $ar_data = $db->selectall_arrayref("SELECT * FROM testtable", { Slice => {} }); yields a recordset that, according to Devel::Size::total_size is 449337164 bytes. This is about 9553 bytes/record. I can live with that.
However, when writing the same data to a fixed-length file, and subsequently reading it in a new variable, the size turns out to be 773605490 bytes, 16446 bytes/record.
The code used to read the data:
# ReadData ($filename) returns ar_data
sub ReadData ($$) {
my ($self, $filename) = @_;
my $ar_returnvalue = [];
if (!-e "$filename") {
Carp::carp("File [$filename] does not exist");
return undef;
}
open (FLATFILE, '<', $filename) or Carp::croak("Cannot open file [
+$filename]");
while (<FLATFILE>) {
chomp;
push (@{$ar_returnvalue}, Interfaces::FlatFile::ReadRecord($se
+lf, $_));
}
close (FLATFILE);
return $ar_returnvalue;
} ## end sub ReadData ($$)
sub ReadRecord ($$) {
my ($self, $textinput) = @_;
my $hr_returnvalue = {};
my $CurrentColumnName;
for (0 .. $#{$self->columns}) {
$CurrentColumnName = $self->columns->[$_];
if (!(defined $self->flatfield_start->[$_] and defined $self->
+flatfield_length->[$_])) {
# Field is missing interface_start, interface_length or bo
+th, skip it.
next;
}
$hr_returnvalue->{$CurrentColumnName} = substr ($textinput, $s
+elf->flatfield_start->[$_], $self->flatfield_length->[$_]);
$hr_returnvalue->{$CurrentColumnName} =~ s/^\s*(.*?)\s*$/$1/;
+ # Trim whitespace
# Fill empty fields with that field's default value, if such a
+ value is defined.
if ($hr_returnvalue->{$CurrentColumnName} eq "") {
if (defined $self->standaard->[$_]) {
if ($self->datatype->[$_] =~ /^(?:CHAR|VARCHAR|DATE|TI
+ME|DATETIME)$/) {
$hr_returnvalue->{$CurrentColumnName} = sprintf ("
+%s", $self->standaard->[$_]);
} else {
$hr_returnvalue->{$CurrentColumnName} = $self->sta
+ndaard->[$_];
}
} else {
# Remove empty field
delete $hr_returnvalue->{$CurrentColumnName};
}
}
if ($self->datatype->[$_] =~ /^(?:TINYINT|MEDIUMINT|SMALLINT|I
+NT|INTEGER|BIGINT|FLOAT|DOUBLE)$/) {
$hr_returnvalue->{$CurrentColumnName} *= 1;# Multiply by 1
+ to create a numeric value.
}
# Decimal-correction
if ($self->decimals->[$_] > 0 and defined $hr_returnvalue->{$C
+urrentColumnName}) {
$hr_returnvalue->{$CurrentColumnName} /= 10**$self->decima
+ls->[$_];
}
} ## end for (0 .. $#{$self->columns...
return $hr_returnvalue;
} ## end sub ReadRecord ($$)
The above code is from a custom-made Interfaces-object, with an Interfaces::FlatFile role (yes, Moose) that provides fixed-length file interfacing. The object contains the following attributes (only the ones used here are shown):
has 'columns' => (is => 'rw', isa => 'ArrayRef[Str]',
+ lazy_build => 1,);
has 'datatype' => (is => 'rw', isa => 'ArrayRef[Str]',
+ lazy_build => 1,);
has 'decimals' => (is => 'rw', isa => 'ArrayRef[Maybe[Int]]',
+ lazy_build => 1,);
has 'default' => (is => 'rw', isa => 'ArrayRef[Maybe[Value]]',
+ lazy_build => 1,);
has 'flatfield_start' => (is => 'rw', isa => 'ArrayRef[Maybe[Int]]',
+lazy_build => 1,);
has 'flatfield_length' => (is => 'rw', isa => 'ArrayRef[Maybe[Int]]',
+lazy_build => 1,);
These attributes are filled by index, so all the above attributes with index n refer to the same field n.
The question is thus: Why does reading from a fixed-length file need much more memory, and what can I do to fix that? :)