After reading Google\'s policy on making Ajax-generated content crawlable, along with many developers\' blog posts and Stackoverflow Q&A threads on the subject, I\'m left wi
Why didn't I think of this before! Just use http://phantomjs.org. It's a headless webkit browser. You'd just build a set of actions to crawl the UI and capture the html at every state you'd like. Phantom can turn the captured html into .html files for you and save them to your web server.
The whole thing would be automated every build/commit (PhantomJS is command line driven). The JS code you write to crawl the UI would break as you change the UI, but it shouldn't be any worse than automated UI testing, and it's just Javascript so you can use jQuery selectors to grab buttons and click them.
If I had to solve the SEO problem, this is definitely the first approach I'd prototype. Crawl and save, baby. Yessir.
I think a combination of a few technologies and one manually coded hack which you could reuse would fix you right. Here's my crazy, half baked idea. It's theoretical and probably not complete. Step 1:
Ok, so now you have isolated templates. Now we just need to figure out how to build a flat page out of them on the server. I only see two approaches. Step 2:
Hope this helps. Curious to hear the best answer to this. An interesting problem.
We do use PhantomJS for this purpose just like simple as could be said. That works great if you have the rights to use that on your host.
If that is not an option or if you simply don't want to deal with that yourselves. We do have a free service doing this. See this post for more info: http://rogeralsing.com/2013/08/06/seo-indexing-angularjs-sites-or-other-ajax-sites-with-wombit-crawlr/
Use Distal templates. Your website is static HTML which is crawlable and Distal treates the static HTML as a template.
I have found a solution that does not require any Java, Node.js or any other way to make a redundant copy of a JS code generating website. Also it supports all browsers.
So what you need to do is provide the snapshot for Google. It's the best solution, because you dont need to mess with other URLS and so on. Also: you don't add noscript to your basic website so it's lighter.
How to make a snapshot? Phantomjs, HTMLUnit and so on require a server where you can put it and call. You need to configure it, and combine with u website. And this is a mess. Unfortunately there is no PHP headless browser. It's obvious because of the specifics of PHP.
So what is the other way of getting snapshot? Well... if user opens website you can get the snapshot of what he sees with JS (innerHTML).
So what you need to do is:
And if Google Bot visits your hash bang website you get the file of the snapshot for the page requested.
Things to solve:
Also there is one thing: not all pages will be visited by users but you need snapshots for the Google before they visit.
So what to do? There is solution for this also:
But hey, how to visit all those pages? Well. There are some solutions for this:
Also remember to refresh old snaps ocassionally to make them up to date.
I hope to hear from you what do you think about this solution.