Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Help with setting up spamc

by SteveTheTechie (Novice)
on Jul 09, 2014 at 04:26 UTC ( [id://1092838]=perlquestion: print w/replies, xml ) Need Help??

SteveTheTechie has asked for the wisdom of the Perl Monks concerning the following question:

I may be overthinking this, but here goes.

I am the current developer for a free template based website system used by over 9000 Toastmasters public speaking clubs worldwide. We handle over 300,000 emails in a given week for clubs using our system.

All of our server code is in Perl including our email handler. Our email handler supports a wide variety of forwarding email addresses and distribution lists. Up till recently, our main email security approach was to verify club membership. We still want to do that, but we have added the use of SpamAssassin as an additional step targeted at the provided email addresses that are intended for public use.

I set up SpamAssassin using Mail::SpamAssassin in the email handler, and it basically started dragging the server performance significantly (should have expected that).

I am trying to get the spamc/spamd combo going for us. We have spamd set up. I am just stumbling over setting up the use of spamc in the email handler code.

Current SpamAssassin call from email handler (@ line 479 of email handler--*lots* happening before this):
#Spam Test with SpamAssassin... unless ($SpamChecked || $whitelisted) { my $trigger = $CLUBSITES{'spamthreshold'} || 5.0; my $spamtest = new Mail::SpamAssassin({ 'post_config_text' => "requi +red_score $trigger" }); my $status = $spamtest->check_message_text( $message_received ); if ( $status->is_spam() ) { my $score = $status->get_score(); my $threshold = $status->get_required_score(); my $hits = $status->get_names_of_tests_hit(); my $SpamLogMsg = "Score: $score / $threshold (trigger);\t Positiv +e Tests: $hits"; HandleError("SPAM", $SpamLogMsg, $message_received); } $status->finish(); $spamtest->finish(); $SpamChecked++; }

I need to send $message_received to spamc and capture its output in a variable (preferably) so I can get the spam score. I know I can just back quote a system command to capture stdout to a variable, but how can I do both the stdout and the stdin handling here? This should be simple, but I am just missing it...

Replies are listed 'Best First'.
Re: Help with setting up spamc
by NetWallah (Canon) on Jul 09, 2014 at 05:09 UTC
    You could use open()-for-IPC to accomplish what you say you need to do, but
    you already have the code that uses Mail::SpamAssassin's "check_message_text" to perform that function - why would you want to call external code ?

            Profanity is the one language all programmers know best.

      Reason:

      Using Mail::SpamAssassin is very expensive on memory usage. Our server is getting hammered on Mondays when we get the most email volume... gets *very* slow.

      Google "spamd vs spamassassin"... for example, see : spamd vs spamassassin

      spamd is supposedly much faster than spamassassin

      spamc is just a command line interface to spamd that is written in c for speed.

Re: Help with setting up spamc
by andal (Hermit) on Jul 09, 2014 at 06:51 UTC
    I need to send $message_received to spamc and capture its output in a variable (preferably) so I can get the spam score. I know I can just back quote a system command to capture stdout to a variable, but how can I do both the stdout and the stdin handling here? This should be simple, but I am just missing it...

    The somewhat low-level approach in perl would be:

    my $pid = open(CHLD, "-|"); die "Failed to fork: $!\n" unless defined $pid; if($pid == 0) { die "Failed to run spamc: $!" unless open(PROC, "|spamc"); print PROC "My Arguments"; close(PROC); exit(0); } while(<CHLD>) { # collect the input } close(CHLD);

    The above assumes, that spamc writes to STDOUT all output and simply exits. The approach is slow, because there are 2 forks involved. I don't know anything about SpamAssassin, but if you have spamd (daemon), then there should be some network protocol for talking to that daemon. If your program did the talking directly, then you'd save time for making forks.

    Another point. Looks like SpamAssassin is slow in working even without forks. So, your best bet would be processing multiple emails in parallel.

      Ok, this looks very interesting. Have not fiddled with tee and forks for a while.

      I did not think of communicating with spamd directly--thought I had to use the spamc cmd line interface.

      Thanks!

        As I said, I don't know much about SpamAssassin. But quick search brought up Mail::SpamAssassin::Client which implements protocol for talking to spamd.

        Note, the page for Mail::SpamAssassin says

        If you wish to use a command-line filter tool, try the spamassassin or the spamd/spamc tools provided
        So, I would believe, that these tools are good only when you have to use external commands, for example when you program not in perl, but in shell.

        In general, to increase throughput, you should make processing of each message independent as much as possible, so that one message does not have to wait for another. That usually means, that each message handler should run either in separate process, or in separate thread. Using separate spamd is of help only because it internally uses multiple processes/threads to handle messages. But if you feed your messages one by one, then the benefit is lost. And opposite, if you handle your messages in separate processes/threads, it does not make sense to move spamassassin into separate process, because you just add extra overhead of communicating with that process.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1092838]
Approved by boftx
Front-paged by boftx
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2024-04-25 05:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found