monsieur_champs has asked for the wisdom of the Perl Monks concerning the following question:
Begginings
I'm working for a client that asked me to build something he
can use to inspect a catch-all mailbox at his ISP linux box. My
client is an organized and clever half-techie that understands
little about perl and a lot about linux. He is using
Debian stable distro, with perl 5.6.1 and
a bunch of libs that I've asked him to install for my use.
Mail::Box
I've choosen Mail::Box because of its stability and
powerfull control and broad range of supported formats.
At development, everything goes fine, and Mail::Box::Manager
uses Mail::Box::Maildir and Mail::Box::Message to create me an
ideal world where all works as expected: messages go back and
forth, and I can see and handle all requirements.
Production Hell
Things change a lot at production (I have little access to
production, so please take it easy! -- this is a business
requirement from my client). The same program that works at my
development environment and performs quite well fails miserably
when facing the 30_000 (yes, that's four zeros on the right hand)
messages on a single maildir folder. My main problem is that I
don't have a formal failure: Mail::Box just leave
open() telling everybody that there is no messages at
this maildir folder(?!?!). I'm really confused about this error
and can't figure out a good way to tell if I'm missing something
really important or just need a good afternoon of debugging Perl
internals.
I've wrote the following code to try to expose the fail. Hope
I've setted all erros at the maximum noise level. Comments and
related cases are welcome. My client will make tests in one or two
days, and I will have more information to complement this post
then.
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long;
use Pod::Usage;
use Mail::Box::Manager;
my $options;
GetOptions( 'mail-folder=s' => \$options->{folder},
'dump-subject=s' => \$options->{dumpfile},
);
pod2usage( -message => "$0: syntax error: pay attention!\n\n",
-exitval => 1,
-verbose => 1, # Give "Synopsis" and "Arguments"
-filehandle => \*STDERR,
)
unless( ( $options->{folder} and -d $options->{folder} )
# or
# ( $options->{dumpfile} and -f $options->{dumpfile} )
);
my $manager = new Mail::Box::Manager;
my $folder;
eval{
$folder = $manager->open( folder => $options->{folder},
create => 0,
access => 'r',
type => 'maildir',
expand => 'LAZY',
log => 'DEBUG', # adds a lot of noise
trace => 'DEBUG', # adds a lot of noise
);
};
die "Error opening maildir [$options->{folder}]: '$@'\n\n" if $@;
print qq{Folder [} . $folder->name . qq{] aberto com [} .
scalar @$folder . qq{] mensagens.\n\n};
if( $options->{dumpfile} ){
open DUMP, '>', $options->{dumpfile}
or pod2usage( -message => qq{Can't create dumpfile: $!\n\n},
-exitval => 2,
-verbose => 1,
-filehandle => \*STDERR
);
print( DUMP $_->subject(), $/ ) foreach @$folder;
close DUMP
or die qq{Can't close(!?!) dumpfile $options->{dumpfile}\n\n};
} # fi
eval{ $folder->close };
die "Error closing maildir [".$folder->name."]: '$@'.\n\n" if $@;
__END__
=pod
=head1 NAME
mail-box-test - Simple test to see if Mail::Box::Maildir is working
+correctly.
=head1 SYNOPSIS
perl mail-box-test --mail-folder='/path/to/mail/dir/' [--dump-subjec
+t=/path/to/subjects.txt]
=head1 ARGUMENTS
=over 4
=item --mail-folder <FOLDER>
Points to the maildir you want to use for testing.
No maildir means test failure, so please choose a maildir to test.
=item --dump-subject <DUMPFILE>
Force dumping of subject lines of a maildir to a specified file on
the disk.
=back
=head1 DESCRIPTION
This script tests a Mail::Box resource usage under linux for a
client. I'm facing a funny problem when trying to open maildirs with
more than 21,000 messages in it: Mail::Box is telling me its opening
the maildir correctly but no messages are found inside it.
=head1 AUTHOR
Luis Campos de Carvalho, a.k.a. Monsieur Champs.
mailto: monsieur_champs [at] yahoo [dot] com [dot] br
=cut
Re: Mail::Box fails miserably when trying to open 30_000 messages maildir
by xdg (Monsignor) on Jun 26, 2005 at 14:58 UTC
|
I strongly suggest joining the Mail::Box mailing list (see perl.overmeer.net) and posting your issue to the list -- the module author, Mark Overmeer, and many knowledgeable users are pretty quick to respond.
-xdg
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
| [reply] |
Re: Mail::Box fails miserably when trying to open 30_000 messages maildir
by fglock (Vicar) on Jun 26, 2005 at 05:53 UTC
|
"30000" smells like "signed 16-bit overflow". This problem may even be in an underlying system library (because Mail::Box doesn't use XS).
I'd start by investigating which mail box format the production environment uses.
| [reply] |
|
Production uses maildir only.
Sorry, fglock, but I can't see the point. Why this smells like an overflow? File system is out of question, 30_000 files is large but not really a problem...
| [reply] |
|
There are some numbers in computing, which are boundries, and can cause problems (sort of like that whole y2k issue).
Near 30,000 is the number 32,768, which is 2**15. Now, you'd think to yourself, but wouldn't there be problems at 2**16, which is a nice round number in computer terms?
Well, no, because for an integer of (x) bits, if it's signed, it ranges from (-1*(2**(x-1)-1) to 2**(x-1). So, for a 16 bit number, it goes from -32767 to 32768. If the module in question uses XS (compiled C code), it's possible that it was compiled with a 16 bit signed number in there, which will have problems if you try to deal with numbers greater than 32,768.
If the number is exactly 30,000 or less, this probably isn't the issue. If it's over 32,768, this could be a problem.
From looking at the docs for Mail::Box, however, it looks to be pure perl, so I don't think this is the issue in this case.
| [reply] |
|
|
still, it could be reasonable to write simple test script to open that dir and just count:
opendir DIR, "that-dir";
my @files = readdir DIR; # or whatever methos Mail::Box uses
print "count is ", $#files+1;
Just to check if perl's readdir is out of question on your current build (with given libc and so on) | [reply] [d/l] |
|
Well, you know what happens when you presume, right? You make a pre of su and me.
| [reply] |
Re: Mail::Box fails miserably when trying to open 30_000 messages maildir
by TedPride (Priest) on Jun 26, 2005 at 20:45 UTC
|
| [reply] |
|
That's a nice workaround. Solving the problem the hard way will probably help others not to be biten in the future.
Flavio
perl -ple'$_=reverse' <<<ti.xittelop@oivalf
Don't fool yourself.
| [reply] |
Re: Mail::Box fails miserably when trying to open 30_000 messages maildir
by BrowserUk (Patriarch) on Jun 27, 2005 at 12:07 UTC
|
An off-the-wall guess that I have no way to verify. Could it be that you are running out of memory? A quick browse of Mail::Box and it's associated modules leads me to believe that they form a quite heavily nested hierarchy of modules with each level adding another hash or two for each item. Large volumes of nested hashes, even when each individual leaf hash is quite small, can rapidly consume large volumes of space.
On my system, Perl sometimes dies silently when it runs out of space.
Maybe you could monitor programs memory usage when running it on this large directory?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
| [reply] |
|
as of my personal experience, 5.6.1 version was unable to get more than 1Gb of memory, whereas 5.8.x versions behaved betted with respect to this (but was much slower on getting memory, BTW), so I would agree with your reasonable point.
| [reply] |
|
Good point. But my test script stills printing all planned output, even that below the "count messages point". I suppose that an out-of-memory Perl shouldn't be able to print anything...
I'm awaiting for the test script results, so I can tell you more details. Please, if you think that the posted script is not enought to expose the problem, tell me, and I will try to write a more precise test. Patches welcome, too.
| [reply] |
|
Hmm. Probably not memory then.
I took a quick scan of the code in Mail::Box::Manager and notice something that might be relevant. I the code for M::B::M::open(), I see this:
return if $require_failed{$class};
and scanning back to see where $require_failed is being set and see this:
unless($folder_type)
{ # Try to autodetect foldertype.
foreach (@{$self->{MBM_folder_types}})
{ next unless $_;
(my $abbrev, $class, @defaults) = @$_;
next if $require_failed{$class};
eval "require $class";
if($@)
{ $require_failed{$class}++;
next;
}
if($class->foundIn($name, @defaults, %args))
{ $folder_type = $abbrev;
last;
}
}
}
I may be misinterpreting the code, but it looks to me that if it attempts to auto detect the folder type and then fails to require the module for the folder type it detects, it sets the flag to indicate the failure and skips on without logging an error. Then later, it checks the flag and if it is set, fails silently returning undef.
Could it be misdetecting the folder type and failing silently as a result?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
| [reply] [d/l] [select] |
|
|