Does Facebook know I'm scraping it with PhantomJS and can it change its website to counter me?

為{幸葍}努か 提交于 2020-01-11 12:58:26

问题


So, maybe I'm being paranoid.

I'm scraping my Facebook timeline for a hobby project using PhantomJS. Basically, I wrote a program that finds all of my ads by querying the page for the text Sponsored with XPATH inside of phantom's page.evaluate block. The text was being displayed as innerHTML of html a elements.

Things were working great for a few days and it was finding tons of ads.

Then it stopped returning any results.

When I logged into Facebook manually to inspect the elements again, I found that the word Sponsored was now appearing on the page in an ::after pseudoclass element with the css property content: sponsored. This means that an XPATH query for the text no longer yields any results. No joke, Facebook seemed to have changed the way they rendered this word after being scraped for a couple days.

Paranoid. I told you.

So, I offer this question to the community of Javascript, Web-Scraping, and PhantomJS developers out there. What the heck is going on. Can Facebook know what my PhantomJS program is doing inside of the page.evaluate block?

If so, how? Would my phantom commands appear in a key logger program embedded in the page, for instance?

What are some of your theories?


回答1:


It is perfectly possible to detect PhantomJS even if the useragent is spoofed. There are plenty of litte ways in which it differs from other browsers, among others:

  • Wrong order of headers
  • Lack of media plugins and latest JS capabilities
  • PhantomJS-specific methods, like window.callPhantom
  • PhantomJS name in the stack trace

and many others.

Please refer to this excellent article and presentation linked there for details: https://blog.shapesecurity.com/2015/01/22/detecting-phantomjs-based-visitors/

Maybe puppeteer would be a better fit for your needs as it is based on a real cutting-edge Chromium browser.



来源:https://stackoverflow.com/questions/47708260/does-facebook-know-im-scraping-it-with-phantomjs-and-can-it-change-its-website

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!