Scrapy SgmlLinkExtractor question

前端 未结 4 349
粉色の甜心
粉色の甜心 2021-01-12 01:23

I am trying to make the SgmlLinkExtractor to work.

This is the signature:

SgmlLinkExtractor(allow=(), deny=(), allow_domains=(), deny_domains=(), res         


        
相关标签:
4条回答
  • 2021-01-12 01:39

    if you check documentation a "Warning" is clearly written

    "When writing crawl spider rules, avoid using parse as callback, since the Crawl Spider uses the parse method itself to implement its logic. So if you override the parse method, the crawl spider will no longer work."

    url for verification

    0 讨论(0)
  • 2021-01-12 01:43

    allow=(r'/aadler/', ...

    0 讨论(0)
  • 2021-01-12 01:47

    You are missing comma after first element for "rules" to be a tuple..

    rules = (Rule(SgmlLinkExtractor(allow=('/careers/n.\w+', )), callback='parse', follow=True),)
    
    0 讨论(0)
  • 2021-01-12 01:52

    You are overriding the "parse" method it appears. "parse", is a private method in CrawlSpider used to follow links.

    0 讨论(0)
提交回复
热议问题