Getting prerender.io to work with Facebook crawler (maven, GAE)?

大城市里の小女人 提交于 2019-12-22 16:11:26

问题


I have an angularjs application, where I want to share pages on Facebook. This is handled with meta tags (https://developers.facebook.com/docs/sharing/best-practices), but I cannot change the meta tags with js because js isn't executed by Facebook's crawlers. Therefore I want to use prerender.io to execute and render my pages before the crawler gets them from server.

The thing is I am not sure if I understand the documentation correct (https://github.com/greengerong/prerender-java).

This is the example web.xml from the README.md on GitHub:

<filter>
      <filter-name>prerender</filter-name>
      <filter-class>com.github.greengerong.PreRenderSEOFilter</filter-class>
      <init-param>
          <param-name>prerenderServiceUrl</param-name>
          <param-value>http://localhost:3000</param-value>
      </init-param>
      <init-param>
          <param-name>crawlerUserAgents</param-name>
          <param-value>me</param-value>
      </init-param>
  </filter>
  <filter-mapping>
      <filter-name>prerender</filter-name>
      <url-pattern>/*</url-pattern>
  </filter-mapping>

After a bunch of attempts to get things right, I found out that if I simply remove this part:

      <init-param>
          <param-name>prerenderServiceUrl</param-name>
          <param-value>http://localhost:3000</param-value>
      </init-param>

I don't have to deal with websockets in GAE (that gave me this error: 'Caused by: java.net.SocketException: Permission denied: ...'), and I can use the default already deployed at http://prerender.herokuapp.com. Question 1) What are the pros/cons using the default service vs. deploying my own?

Now the service seems to be working, and I don't get server errors - great!

As described in the documentation (https://github.com/greengerong/prerender-java) I first used 'me' as user crawler agent. When using 'me' as crawler agent, prerender started to cache my own API calls. E.g when I was GETing a bunch of items from my server, prerender returned some HTML and cached the URI with the JSON i wanted. So now I have some cashed pages at prerender.io, but not exactly the pages I want to cache :).

So I changed crawlerUserAgent to this:

     <init-param>
          <param-name>crawlerUserAgents</param-name>
          <param-value>facebookexternalhit/1.1</param-value>
      </init-param>

(I've also tried facebookexternalhit, FacebookUserExternalHit, ...). Now I don't get any pages cached on prerender.io, and the javascript isn't executed before Facebooks crawler gets the pages. Having a look at the debugger (https://developers.facebook.com/tools/debug/og/object/), it tells me that the crawler only sees the original meta tags, and not the meta tags that I replace with js on different pages (the meta tags are replaced when I open my page and inspect the elements).

Question 2) Am I doing this right? Should I try other crawler user agents? Is facebookexternalhit correct?

来源:https://stackoverflow.com/questions/26013475/getting-prerender-io-to-work-with-facebook-crawler-maven-gae

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!