http://qs321.pair.com?node_id=419476


in reply to Re: Most of the email spam I get is:
in thread Most of the email spam I get is:

My experiments with Bayesian filtering were a wash; after training ifile on my entire very large corpus of mail, I found that I had to continually go through the whole spam bin for false positives.

I did the same thing when I first came to Bayesian filtering, but that's not the way to get the best results out of it. Filtering is more accurate if you simply correct its mistakes as they occur than if you preload it with an existing corpus.

There's much more information about Bayesian filtering at Paul Graham's site.

Markus

Replies are listed 'Best First'.
Re: Most of the email spam I get is:
by jonadab (Parson) on Jan 05, 2005 at 22:14 UTC
    Filtering is more accurate if you simply correct its mistakes as they occur

    If I have to correct false positives as the occur, this so-called "filtering" is no good to me at all, because it means I have to go through all the spam. Worse than useless. My existing filtering system is significantly better, because I am confident that 100.000% of everything filtered into the spam folders is, in fact, worthless junk. Additionally, *most* of my legitimate mail is filtered into various spam-free folders based on topic, list, sender or whatever. The only mail I have to sort by hand is the stuff that lands in my inbox (because none of my filters pick it up).

    I don't want to correct my filter's errors continually. If I have to do that, it's not doing its job at ALL; *I* would be doing 100% of the filter's job, then.