twisted critical unhandled error on scrapy tutorial

南楼画角 提交于 2020-01-02 04:57:06

问题


I'm new in programming and I'm trying to learn scrapy, using scrapy tutorial: http://doc.scrapy.org/en/latest/intro/tutorial.html

So I ran "scrapy crawl dmoz" command and got this error:

2015-07-14 16:11:02 [scrapy] INFO: Scrapy 1.0.1 started (bot: tutorial)
2015-07-14 16:11:02 [scrapy] INFO: Optional features available: ssl, http11
2015-07-14 16:11:02 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE':     'tu
torial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'BOT_NAME':   'tutorial'}

2015-07-14 16:11:05 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsol
e, LogStats, CoreStats, SpiderState
Unhandled error in Deferred:
2015-07-14 16:11:06 [twisted] CRITICAL: Unhandled error in Deferred:
2015-07-14 16:11:07 [twisted] CRITICAL:

I'm using windows 7 and python 2.7. Anybody knows what's the problem? How could I fix that?

EDIT: My spider file code is:

# This package will contain the spiders of your Scrapy project
#
# Please refer to the documentation for information on how to create and manage
# your spiders.
import scrapy


class DmozSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
     start_urls = [
    "http://www.dmoz.org/computers/programming/languages/python/books/",
    "http://www.dmoz.org/computer/programming/languages/python/resources/"
]

    def parse(self, response):
        filename = response.url.split("/")[-2] + '.html'
        with open(filename,'wb') as f:
            f.write(response.body)

items.py code:

import scrapy

class DmozItem(scrapy.Item):
    title = scrapy.Field()
    link = scrapy.Field()
    desc = scrapy.Field()

pip list:

  • bootstrap-admin (0.3.3)
  • cffi (1.1.2)
  • characteristic (14.3.0)
  • cryptography (0.9.3)
  • cssselect (0.9.1)
  • Django (1.7.7)
  • django-auth-ldap (1.2.4)
  • django-debug-toolbar (1.3.0)
  • django-mssql (1.6.2)
  • django-pyodbc (0.2.6)
  • django-pyodbc-azure (1.2.2)
  • django-redator (0.2.3)
  • django-reversion (1.8.5)
  • django-summernote (0.6.0)
  • django-windows-tools (0.1.1)
  • django-wysiwyg-redactor (0.4.3.2)
  • enum34 (1.0.4)
  • ez-setup (0.9)
  • flup (1.0.2)
  • idna (2.0)
  • ipaddress (1.0.13)
  • iso8601 (0.1.4)
  • logging (0.4.9.6)
  • lxml (3.4.4)
  • mechanize (0.2.5)
  • MySQL-python (1.2.4)
  • pbr (0.10.8)
  • Pillow (2.7.0)
  • pip (7.1.0)
  • pyasn1 (0.1.8)
  • pyasn1-modules (0.0.6)
  • pycparser (2.14)
  • pymongo (2.6)
  • pyodbc (3.0.7)
  • pyOpenSSL (0.15.1)
  • pypm (1.4.3)
  • python-ldap (2.4.18)
  • pythonselect (1.3)
  • pywin32 (218.3)
  • queuelib (1.2.2)
  • Scrapy (1.0.1)
  • selenium (2.44.0)
  • service-identity (14.0.0)
  • setuptools (18.0.1)
  • six (1.9.0)
  • sqlparse (0.1.15)
  • stevedore (1.3.0)
  • Twisted (15.2.1)
  • virtualenv (1.11.6)
  • virtualenv-clone (0.2.5)
  • virtualenvwrapper (4.3.2)
  • virtualenvwrapper-powershell (12.7.8)
  • w3lib (1.11.0)
  • xlrd (0.9.2)
  • zope.interface (4.1.2)

Thx for the attention and sry for my poor English, isn't my native language.


回答1:


I'm beginning to learn scrapy as well and encounter the same question with yours. After struggling with it for an afternoon, finally I found it's due to the pywin32 module only download without install. You can try input the command below in the cmd to finish the pywin32 module install and try crawl again:

python python27\scripts\pywin32_postinstall.py -install

I hope it will help!




回答2:


The short answer is You are missing pywin32!

The other answers are basically right, but not 100% correct. pywin32 is not a pip install! You must download the installer package from here:

http://sourceforge.net/projects/pywin32/files/pywin32/

Make sure that you get the correct bit: 32 or 64. In my case I didn't realize I had the 32bit version of Python installed on my 64bit machine and the installer fails with "Cannot find Python 2.7 installation in registry". I had to install the 32bit version of pywin32. Once I did this, scrapy crawl site worked.




回答3:


I dont see what your doing with items as your writing to a file. But its the imports maybe. Try this if this does not work try, pip install pywin --update and pip install Twisted --update, that should reinstall any corrupted files. Plus I dont know if it's Stack's problem but you had some misplaced identation. from scrapy.spiders import Spider

from {Projectname}.items import {Itemclass}
import scrapy


class DmozSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    start_urls = [
    "http://www.dmoz.org/computers/programming/languages/python/books/",
    "http://www.dmoz.org/computer/programming/languages/python/resources/"]

    def parse(self, response):
        filename = response.url.split("/")[-2] + '.html'
        with open(filename,'wb') as f:
            f.write(response.body)



回答4:


Scrapy crashes with: ImportError: No module named win32api

You need to install pywin32 because of this Twisted bug.



来源:https://stackoverflow.com/questions/31439540/twisted-critical-unhandled-error-on-scrapy-tutorial

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!