Currently best spam filter algorithm

喜夏-厌秋 提交于 2019-11-30 10:04:44

It's good to look into supervised learning techniques. There've been a number of studies where the Multinomial Naive Bayes Classifier has been used for spam email filtering with a lot of success. If it worked for spam email filtering, then it should work with SMS filtering. What you need is a huge dataset of example spam SMS texts and train the classifier with it.

Also, it may be helpful to look into the Support Vector Machine, which; although less widely used in spam filtering; is a much more powerful technique.

Also, just training the algorithms on raw text may not quite be the best way forward. There was a study by Mehran Sahami from 1998 that found that they achieved superior performance when they took other heuristics into consideration (e.g. was the email sent to a mailing list? was the email sent from a domain name that ended in either ".edu",".com",".org"? did the email contain multiple punctuation marks ("!!!")?, and so forth).

But start off with the Multinomial Naive Bayes Classifier. It's very simple to implement, and it's very easy to use, and from personal experience: it has a very short training time, as well.

As I understand it most modern spam filtering is a combination of an implementation of Bayes' theorem and some heuristics, e.g. sender blacklists, standards compliance, sending patterns.

The easiest place to implement this in the mobile phone network would probably be at the SMS message centre, since the volume is higher, which makes a lot of the heuristics easier to implement.

Using a wide variety of algorithms and heuristics (and not "the" best method) is a good approach to protect your network and subscribers from spam, fraud, malicious content, cyber-bullying, identity theft, viruses, etc.

Cloudmark and it's various partners and competition is a good place to start looking.

Gennady Vanin Геннадий Ванин

Why do you need to detect spam post-factum, prevent it in the butt ... again, int the bud ...

Update:
Filters are easily and broadly being used by blackhat SEO/SEm and criminals to blacklist/dump competitors.
Besides, they are retroactive, hence, doomed to always lag behind spammers techniques advancements

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!