The biggest problem I have with this is that what one considers a swear word or not, is very subjective. I find words like "gun", "God" and "religion" far more dubious and harmful to children than words like "sex" or "breast".

Do you really want to rely on some unknown figure to come up with a list of "taboo" words?

Re^2: Profanity and expletives
by jfrm (Monk) on Mar 01, 2012 at 16:51 UTC

    Good point in many contexts but in my case, it doesn't matter because I only want to use it for statistical analysis of risk - it doesn't matter if I have some false positives. The larger the list the better, really.

      > statistical analysis of risk

      you most certainly want to search for statistical spam filters.

      Cheers Rolf

      If you don't mind false positives, start with /usr/share/dict/words. Or, as a regexp with false positives, /\S+/g.