How would you adblock using Python?

戏子无情 提交于 2019-12-07 05:41:57

问题


I'm slowly building a web browser in PyQt4 and like the speed i'm getting out of it. However, I want to combine easylist.txt with it. I believe adblock uses this to block http requests by the browser.

How would you go about it using python/PyQt4?

[edit1] Ok. I think i've setup Privoxy. I haven't setup any additional filters and it seems to work. The PyQt4 i've tried to use looks like this

self.proxyIP = "127.0.0.1"  
self.proxyPORT= 8118  
proxy = QNetworkProxy()  
proxy.setType(QNetworkProxy.HttpProxy)  
proxy.setHostName(self.proxyIP)  
proxy.setPort(self.proxyPORT)  
QNetworkProxy.setApplicationProxy(proxy)

However, this does absolutely nothing and I cannot make sense of the docs and can not find any examples.

[edit2] I've just noticed that i'f I change self.proxyIP to my actual local IP rather than 127.0.0.1 the page doesn't load. So something is happening.


回答1:


I know this is an old question, but I thought I'd try giving an answer for anyone who happens to stumble upon it. You could create a subclass of QNetworkAccessManager and combine it with https://github.com/atereshkin/abpy. Something kind of like this:

from PyQt4.QtNetwork import QNetworkAccessManager
from abpy import Filter
adblockFilter = Filter(file("easylist.txt"))
class MyNetworkAccessManager(QNetworkAccessManager):
    def createRequest(self, op, request, device=None):
        url = request.url().toString()
        doFilter = adblockFilter.match(url)
        if doFilter:
            return QNetworkAccessManager.createRequest(self, self.GetOperation, QNetworkRequest(QUrl()))
        else:
            QNetworkAccessManager.createRequest(self, op, request, device)
myNetworkAccessManager = MyNetworkAccessManager()

After that, set the following on all your QWebView instances, or make a subclass of QWebView:

QWebView.page().setNetworkAccessManager(myNetworkAccessManager)

Hope this helps!




回答2:


Is this question about web filtering?

Then try use some of external web-proxy, for sample Privoxy (http://en.wikipedia.org/wiki/Privoxy).




回答3:


The easylist.txt file is simply plain text, as demonstrated here: http://adblockplus.mozdev.org/easylist/easylist.txt

lines beginning with [ and also ! appear to be comments, so it is simply a case of sorting through the file, and searching for the correct things in the url/request depending upon the starting character of the line in the easylist.txt file.




回答4:


Privoxy is solid. If you want it to be completely API based though, check out the BrightCloud web filtering API as well.



来源:https://stackoverflow.com/questions/1083170/how-would-you-adblock-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!