问题
I have an angularjs application, where I want to share pages on Facebook. This is handled with meta tags (https://developers.facebook.com/docs/sharing/best-practices), but I cannot change the meta tags with js because js isn't executed by Facebook's crawlers. Therefore I want to use prerender.io to execute and render my pages before the crawler gets them from server.
The thing is I am not sure if I understand the documentation correct (https://github.com/greengerong/prerender-java).
This is the example web.xml from the README.md on GitHub:
<filter>
<filter-name>prerender</filter-name>
<filter-class>com.github.greengerong.PreRenderSEOFilter</filter-class>
<init-param>
<param-name>prerenderServiceUrl</param-name>
<param-value>http://localhost:3000</param-value>
</init-param>
<init-param>
<param-name>crawlerUserAgents</param-name>
<param-value>me</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>prerender</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
After a bunch of attempts to get things right, I found out that if I simply remove this part:
<init-param>
<param-name>prerenderServiceUrl</param-name>
<param-value>http://localhost:3000</param-value>
</init-param>
I don't have to deal with websockets in GAE (that gave me this error: 'Caused by: java.net.SocketException: Permission denied: ...'), and I can use the default already deployed at http://prerender.herokuapp.com. Question 1) What are the pros/cons using the default service vs. deploying my own?
Now the service seems to be working, and I don't get server errors - great!
As described in the documentation (https://github.com/greengerong/prerender-java) I first used 'me' as user crawler agent. When using 'me' as crawler agent, prerender started to cache my own API calls. E.g when I was GETing a bunch of items from my server, prerender returned some HTML and cached the URI with the JSON i wanted. So now I have some cashed pages at prerender.io, but not exactly the pages I want to cache :).
So I changed crawlerUserAgent to this:
<init-param>
<param-name>crawlerUserAgents</param-name>
<param-value>facebookexternalhit/1.1</param-value>
</init-param>
(I've also tried facebookexternalhit, FacebookUserExternalHit, ...). Now I don't get any pages cached on prerender.io, and the javascript isn't executed before Facebooks crawler gets the pages. Having a look at the debugger (https://developers.facebook.com/tools/debug/og/object/), it tells me that the crawler only sees the original meta tags, and not the meta tags that I replace with js on different pages (the meta tags are replaced when I open my page and inspect the elements).
Question 2) Am I doing this right? Should I try other crawler user agents? Is facebookexternalhit correct?
来源:https://stackoverflow.com/questions/26013475/getting-prerender-io-to-work-with-facebook-crawler-maven-gae