Is there an API to force Facebook to scrape a page again?

流过昼夜 提交于 2019-11-26 19:36:12
Igy

Page metadata isn't the sort of thing that should change very often, but you can manually clear the cache by going to Facebook's Debug Tool and entering the URL you want to scrape

There's also an API for doing this, which works for any OG object:

curl -X POST \
     -F "id={object-url OR object-id}" \
     -F "scrape=true" \
     -F "access_token={your access token}" \
     "https://graph.facebook.com"

An access_token is now required. This can be an app or page access_token; no user authentication is required.

If you'd like to do this in PHP in a with-out waiting for a reply, the following function will do this:

//Provide a URL in $url to empty the OG cache
function clear_open_graph_cache($url, $token) {
  $vars = array('id' => $url, 'scrape' => 'true', 'access_token' => $token);
  $body = http_build_query($vars);

  $fp = fsockopen('ssl://graph.facebook.com', 443);
  fwrite($fp, "POST / HTTP/1.1\r\n");
  fwrite($fp, "Host: graph.facebook.com\r\n");
  fwrite($fp, "Content-Type: application/x-www-form-urlencoded\r\n");
  fwrite($fp, "Content-Length: ".strlen($body)."\r\n");
  fwrite($fp, "Connection: close\r\n");
  fwrite($fp, "\r\n");
  fwrite($fp, $body);
  fclose($fp);
}

If you're using the javascript sdk, the version of this you'd want to use is

FB.api('https://graph.facebook.com/', 'post', {
            id: [your-updated-or-new-link],
            scrape: true
        }, function(response) {
            //console.log('rescrape!',response);
        });

I happen to like promises, so an alternate version using jQuery Deferreds might be

function scrapeLink(url){
    var masterdfd = $.Deferred();
    FB.api('https://graph.facebook.com/', 'post', {
        id: [your-updated-or-new-link],
        scrape: true
    }, function(response) {
        if(!response || response.error){
            masterdfd.reject(response);
        }else{
            masterdfd.resolve(response);
        }
    });
    return masterdfd;
}

then:

scrapeLink([SOME-URL]).done(function(){
    //now the link should be scraped/rescraped and ready to use
});

Note that the scraper can take varying amounts of time to complete, so no guarantees that it will be quick. Nor do I know what Facebook thinks about repeated or automated usages of this method, so it probably pays to be judicious and conservative about using it.

This is a simple ajax implementation. Put this on any page you want facebook to scrape immediately;

var url= "your url here";
        $.ajax({
        type: 'POST',
        url: 'https://graph.facebook.com?id='+url+'&scrape=true',
            success: function(data){
               console.log(data);
           }
    });

An alternative solution from within a Drupal node update using curl could be something like this :

<?php
function your_module_node_postsave($node) {
    if($node->type == 'your_type') {
        $url = url('node/'.$node->nid,array('absolute' => TRUE));
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, 'https://graph.facebook.com/v1.0/?id='. urlencode($url). '&scrape=true');
        $auth_header = 'Oauth yOUR-ACCESS-TOKEn';
        curl_setopt($ch, CURLOPT_HTTPHEADER, array($auth_header));
        curl_setopt($ch, CURLOPT_POST, 1);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        $r = curl_exec($ch);
        curl_close ($ch);
    }
}

Notice the hook_node_postsave() implementation which is not standard Drupal core supported. I had to use www.drupal.org/project/hook_post_action in order to get this facebook scrape pickup last made changes to the node, since hook_node_update() is not triggered after databases have been updated.

Facebook requires now the access token in order to get this done. Guidelines to acquire a token can be found here : https://smashballoon.com/custom-facebook-feed/access-token/

I'm the author of Facebook Object Debugger CLI, a command-line interface written in PHP, aim to refresh the Facebook cache for a single URL or a bunch of URLS using as input a text file. The package is also available on Packagist and can be installed using Composer.

There are changes in Graph API v2.10:

When making a GET request against a URL we haven't scraped before, we will also omit the og_object field. To trigger a scrape and populate the og_object, issue a POST /{url}?scrape=true. Once scraped, the og_object will remain cached and returned on all future read requests.

We will require an access token for these requests in all versions of the Graph API beginning October 16, 2017.

Source: Introducing Graph API v2.10

So now we should use POST-method for scraping:

POST /{url}?scrape=true

Not

A solution with the PHP Facebook SDK:

<?php
   try {
      $params = [
         'id' => 'https://www.mysitetoscrape.com/page',
         'scrape' => 'true',
      ];
      $response = $fb->post('/', $params);
      print_r($response);
   } catch(\Facebook\Exceptions\FacebookResponseException $e) {
      // When Graph returns an error
      echo 'Graph returned an error: ' . $e->getMessage();
   } catch(\Facebook\Exceptions\FacebookSDKException $e) {
      // When validation fails or other local issues
      echo 'Facebook SDK returned an error: ' . $e->getMessage();
   }
?>

Here's my Ruby solution using Koala gem and Facebook API v2.9

    api = Koala::Facebook::API.new(access_token)
    response = api.put_object(nil, nil, {scrape: true, id: "url-of-page-to-scrape"})

response should be a hash of attributes retrieved from the og: meta tags on the page which was scraped.

I was facing this same problem. There is a simple way to clear cache.

  1. http://developers.facebook.com/tools/debug
  2. Enter the URL following by fbrefresh=CAN_BE_ANYTHING

Examples: http://www.example.com?fbrefresh=CAN_BE_ANYTHING

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!