Facebook crawler is hitting my server hard and ignoring directives. Accessing same resources multiple times

后端 未结 8 1436
盖世英雄少女心
盖世英雄少女心 2021-02-05 05:52

The Facebook Crawler is hitting my servers multiple times every second and it seems to be ignoring both the Expires header and the og:ttl property.

In some cases, it is

8条回答
  •  傲寒
    傲寒 (楼主)
    2021-02-05 06:38

    I received word back from the Facebook team themselves. Hopefully, it brings some clarification to how the crawler treats image URLs.

    Here it goes:

    The Crawler treats image URLs differently than other URLs.

    We scrape images multiple times because we have different physical regions, each of which need to fetch the image. Since we have around 20 different regions, the developer should expect ~20 calls for each image. Once we make these requests, they stay in our cache for around a month - we need to rescrape these images frequently to prevent abuse on the platform (a malicious actor could get us to scrape a benign image and then replace it with an offensive one).

    So basically, you should expect that the image specified in og:image will be hit 20 times after it has been shared. Then, a month later, it will be scraped again.

提交回复
热议问题