Natural Language Processing: Find obscenities in English?

后端 未结 11 1268
自闭症患者
自闭症患者 2021-02-09 21:15

Given a set of words tagged for part of speech, I want to find those that are obscenities in mainstream English. How might I do this? Should I just make a huge list, and check f

11条回答
  •  深忆病人
    2021-02-09 21:57

    You want to use Bayesian Analysis to solve this problem. Bayesian probability is a powerful technique used by spam filters to detect spam/phishing messages in your email inbox. You can train your analysis engine so that it can improve over time. The ability to detect a legitimate email vs. a spam email sounds identical to the problem you are experiencing.

    Here are a couple of useful links:

    A Plan For Spam - The first proposal to use Bayesian analysis to combat spam.

    Data Mining (ppt) - This was written by a colleague of mine.

    Classifier4J - A text classifier library written in Java (they exist for every language, but you tagged this question with Java).

提交回复
热议问题