Bayesian spam filtering library for Python

青春壹個敷衍的年華 提交于 2019-12-02 15:38:59

Do you want spam filtering or Bayesian classification?

For Bayesian classification there are a number of Python modules. I was just recently reviewing Orange which looks very impressive. R has a number of Bayesian modules. You can use Rpy to hook into R.

Try Reverend. It's a spam filtering module.

RedisBayes looks good to me:

http://pypi.python.org/pypi/redisbayes/0.1.3

In my experience Redis is an awesome addition to your stack and can help process data at blazing fast speeds compared to MySQL, PostgreSQL or any other RDBMS.

import redis, redisbayes
rb = redisbayes.RedisBayes(redis=redis.Redis())

rb.train('good', 'sunshine drugs love sex lobster sloth')
rb.train('bad', 'fear death horror government zombie god')

assert rb.classify('sloths are so cute i love them') == 'good'
assert rb.classify('i fear god and love the government') == 'bad'

print rb.score('i fear god and love the government')

rb.untrain('good', 'sunshine drugs love sex lobster sloth')
rb.untrain('bad', 'fear death horror government zombie god')

Hope that helps a bit.

Try to use bogofilter, I'm not sure how it can be used from Python. Bogofilter is integrated with many mail systems, which means a relative ease of interfacing.

SpamBayes is maintained, and is mature (i.e. it works without having to have new releases all the time). It will easily do what you want. Note that SpamBayes is only loosely Bayesian (it uses chi-squared combining), but presumably you're after any sort of statistical token-based classification, rather than something specifically Bayesian.

A module in the Python natural language toolkit (nltk) does naïve Bayesian classification: nltk.classify.naivebayes.

Disclaimer: I know crap all about Bayesian classification, naïve or worldly.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!