Scrapy get all links from any website

后端 未结 2 1136
暖寄归人
暖寄归人 2021-02-09 17:02

I have the following code for a web crawler in Python 3:

import requests
from bs4 import BeautifulSoup
import re

def get_links(link):

    return_links = []

           


        
2条回答
  •  野性不改
    2021-02-09 17:45

    If you want to allow crawling of all domains, simply don't specify allowed_domains, and use a LinkExtractor which extracts all links.

    A simple spider that follows all links:

    class FollowAllSpider(CrawlSpider):
        name = 'follow_all'
    
        start_urls = ['https://example.com']
        rules = [Rule(LinkExtractor(), callback='parse_item', follow=True)]
    
        def parse_item(self, response):
            pass
    

提交回复
热议问题