Natural Language Processing: Find obscenities in English?

后端未结

关注

 11  1268

自闭症患者 2021-02-09 21:15

Given a set of words tagged for part of speech, I want to find those that are obscenities in mainstream English. How might I do this? Should I just make a huge list, and check f

11条回答

深忆病人 (楼主)

2021-02-09 21:57

You want to use Bayesian Analysis to solve this problem. Bayesian probability is a powerful technique used by spam filters to detect spam/phishing messages in your email inbox. You can train your analysis engine so that it can improve over time. The ability to detect a legitimate email vs. a spam email sounds identical to the problem you are experiencing.

Here are a couple of useful links:

A Plan For Spam - The first proposal to use Bayesian analysis to combat spam.

Data Mining (ppt) - This was written by a colleague of mine.

Classifier4J - A text classifier library written in Java (they exist for every language, but you tagged this question with Java).

0 讨论(0)

查看其它11个回答
发布评论:

提交评论
- 加载中...