Problems? Is your data what you think it is? | |
PerlMonks |
Re^4: Text Analysis Tools to compare Slinker and Stinker?by mojotoad (Monsignor) |
on Jan 22, 2003 at 06:57 UTC ( [id://228957]=note: print w/replies, xml ) | Need Help?? |
'Misspellings' are precisely where Bayesian filtering, once trained, will help tremendously (though as others have pointed out, never conclusively).
As an example from the anti-spam efforts, once Bayesian filtering was enabled they were amazed that single token with the highest probability of indicating spam was 'FF0000', the hex value for bright red. Unexpected, but damning. Consistently misspellt words could show up accordingly. Mattt
In Section
Seekers of Perl Wisdom
|
|