I\'m new in PHP
I have an array like this
$suspiciousList = array(
array (\"word\" => \"badword1\", \"score\" => 400, \"type\" => 1),
array
This question is a good start: How do you implement a good profanity filter? - and I agree with the conclusion, i.e. the detection will have always poor results.
I would try these approaches:
1) Simply detect words that are vulgar according to your dictionary.
2) Come up with a few heuristics like "continuous sequence of 'words' composed of one letter" (b a d w o r d) and use them to evaluate users' posts. Then you can compute expected number of vulgar words: \sum_i^{number of your heuristics} P_i * N_i
, where P_i
is the probability that word found with heuristic i
is really a vulgar one and N_i
is a number of words found by heuristics i
. I think the probabilistic approach is better than simply stating "this post does (not) contain a vulgar word".
3) Let a moderator decide if a post is really vulgar or not. Otherwise imperfection of your automatic replacing method will most probably get your users mad.
4) I think it's useless to look up words in an English (or Turkish?) dictionary in order to find words that are not really English words because people misspell words too much these days.