问题
note:
The page I am crawling dosen't use javascript till the point where I am right now. I have also tried using scrapy_splash but got the same error! and I have relied on this course for starting the spider.
The issue:
scrapy spider gives this error:
raise TypeError('to_bytes must receive a str or bytes '
TypeError: to_bytes must receive a str or bytes object, got Selector
What I want:
The string as output which includes "some number of records".
What I tried?
This and this and such other questions. They don't address the questions I am facing.
My Code:
import scrapy
from scrapy import FormRequest
class abcSpider(scrapy.Spider):
name = 'abc'
allowed_domains = ['citizen.mahapolice.gov.in']
def start_requests(self):
yield scrapy.Request(
url='http://citizen.mahapolice.gov.in/Citizen/MH/PublishedFIRs.aspx',
headers={
'Referer': 'https://citizen.mahapolice.gov.in/Citizen/MH/PublishedFIRs.aspx'
},
callback=self.parse
)
def parse(self, response):
yield FormRequest.from_response(
response,
formid='form1',
formdata={
'__EVENTTARGET': response.xpath("//input[@name='__EVENTTARGET']/@value"),
'__EVENTARGUMENT': response.xpath("//*[@id='__EVENTARGUMENT']/@value"),
'__LASTFOCUS': response.xpath("//*[@id='__LASTFOCUS']/@value"),
'__VIEWSTATE':response.xpath("//*[@id='__VIEWSTATE']/@value"),
'__VIEWSTATEGENERATOR': "6F2EA376",
'__PREVIOUSPAGE': response.xpath("//*[@id='__PREVIOUSPAGE']/@value"),
'__EVENTVALIDATION': response.xpath("//*[@id='__EVENTVALIDATION']/@value"),
'ctl00$hdnSessionIdleTime': response.xpath("//*[@id='hdnSessionIdleTime']/@value"),
'ctl00$hdnUserUniqueId': response.xpath("//*[@id='hdnUserUniqueId']/@value"),
'ctl00$ContentPlaceHolder1$meeDateOfRegistrationFrom_ClientState': response.xpath(
"//*[@id='ContentPlaceHolder1_meeDateOfRegistrationFrom_ClientState']/@value"),
'ctl00$ContentPlaceHolder1$txtDateOfRegistrationFrom': "01/07/2020",
'ctl00$ContentPlaceHolder1$meeDateOfRegistrationTo_ClientState':
response.xpath(
"//*[@id='ContentPlaceHolder1_meeDateOfRegistrationTo_ClientState']/@value"),
'ctl00$ContentPlaceHolder1_txtDateOfRegistrationTo': "03/07/2020",
'ctl00$ContentPlaceHolder1$ddlDistrict': "19409",
'ctl00$ContentPlaceHolder1$ddlPoliceStation': "",
'ctl00$ContentPlaceHolder1$txtFirno': "",
'ctl00$ContentPlaceHolder1$btnSearch': "Search",
'ctl00$ContentPlaceHolder1$ucRecordView$ddlPageSize': "0",
'ctl00$ContentPlaceHolder1$ucGridRecordView$txtPageNumber': ""
},
callback=(self.after_login),
)
def after_login(self, response):
police_stations = response.xpath(
'//*[@id="ContentPlaceHolder1_lbltotalrecord"]/text()').get()
print(police_stations)
Terminal Output:
2020-07-15 15:11:37 [scrapy.utils.log] INFO: Scrapy 2.2.0 started (bot: xyz)
2020-07-15 15:11:37 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.8.2 (default, Apr 27 2020, 15:53:34) - [GCC 9.3.0], pyOpenSSL 19.1.0 (OpenSSL 1.1.1f 31 Mar 2020), cryptography 2.8, Platform Linux-5.4.0-40-generic-x86_64-with-glibc2.29
2020-07-15 15:11:37 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2020-07-15 15:11:37 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'xyz',
'NEWSPIDER_MODULE': 'xyz.spiders',
'SPIDER_MODULES': ['xyz.spiders']}
2020-07-15 15:11:38 [scrapy.extensions.telnet] INFO: Telnet Password: db3dd9550774d0ab
2020-07-15 15:11:38 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
2020-07-15 15:11:39 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-07-15 15:11:39 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-07-15 15:11:39 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2020-07-15 15:11:39 [scrapy.core.engine] INFO: Spider opened
2020-07-15 15:11:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-07-15 15:11:39 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-07-15 15:11:40 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://citizen.mahapolice.gov.in/Citizen/MH/index.aspx> from <GET http://citizen.mahapolice.gov.in/Citizen/MH/PublishedFIRs.aspx>
2020-07-15 15:11:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://citizen.mahapolice.gov.in/Citizen/MH/index.aspx> (referer: https://citizen.mahapolice.gov.in/Citizen/MH/PublishedFIRs.aspx)
2020-07-15 15:11:40 [scrapy.core.scraper] ERROR: Spider error processing <GET https://citizen.mahapolice.gov.in/Citizen/MH/index.aspx> (referer: https://citizen.mahapolice.gov.in/Citizen/MH/PublishedFIRs.aspx)
Traceback (most recent call last):
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/utils/defer.py", line 120, in iter_errback
yield next(it)
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/utils/python.py", line 346, in __next__
return next(self.data)
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/utils/python.py", line 346, in __next__
return next(self.data)
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
for x in result:
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/spidermiddlewares/referer.py", line 340, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/home/sangharshmanuski/Documents/delet/xyz/xyz/spiders/abc.py", line 20, in parse
yield FormRequest.from_response(
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/http/request/form.py", line 58, in from_response
return cls(url=url, method=method, formdata=formdata, **kwargs)
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/http/request/form.py", line 31, in __init__
querystr = _urlencode(items, self.encoding)
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/http/request/form.py", line 71, in _urlencode
values = [(to_bytes(k, enc), to_bytes(v, enc))
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/http/request/form.py", line 71, in <listcomp>
values = [(to_bytes(k, enc), to_bytes(v, enc))
File "/home/sangharshmanuski/.local/lib/python3.8/site-packages/scrapy/utils/python.py", line 104, in to_bytes
raise TypeError('to_bytes must receive a str or bytes '
TypeError: to_bytes must receive a str or bytes object, got Selector
2020-07-15 15:11:40 [scrapy.core.engine] INFO: Closing spider (finished)
2020-07-15 15:11:40 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 648,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 8150,
'downloader/response_count': 2,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/302': 1,
'elapsed_time_seconds': 1.116569,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2020, 7, 15, 9, 41, 40, 607840),
'log_count/DEBUG': 2,
'log_count/ERROR': 1,
'log_count/INFO': 10,
'memusage/max': 52281344,
'memusage/startup': 52281344,
'response_received_count': 1,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'spider_exceptions/TypeError': 1,
'start_time': datetime.datetime(2020, 7, 15, 9, 41, 39, 491271)}
2020-07-15 15:11:40 [scrapy.core.engine] INFO: Spider closed (finished)
回答1:
You have problem which I mentioned in comment to previous question.
You have to use .get()
when you get values response.xpath(...).get()
in formdata={...}
BTW:
You have still mistake in field name
'ContentPlaceHolder1_txtDateOfRegistrationTo': "03/07/2020",
it has to be
'ctl00$ContentPlaceHolder1$txtDateOfRegistrationTo': "03/07/2020",
And you have to use https://
instead of http://
in starting url.
url = 'https://citizen.mahapolice.gov.in/Citizen/MH/PublishedFIRs.aspx',
If you use http://
then it redirect to main page
https://citizen.mahapolice.gov.in/Citizen/MH/index.aspx
and later you send form to index.aspx
instead of PublishedFIRs.aspx
Minimal working code which you can put in one file and run python script.py
without creating problem
It is without previous errors and it sends to correct url but it still have problem with values __VIEWSTATE
and __EVENTVALIDATION
. If I copy all values from web browser then it works but if I use values from scrapy
then page generates error 500. Probably page uses JavaScript to generate these values.
#!/usr/bin/env python3
import scrapy
from scrapy import FormRequest
class abcSpider(scrapy.Spider):
name = 'abc'
allowed_domains = ['citizen.mahapolice.gov.in']
def start_requests(self):
yield scrapy.Request(
url='https://citizen.mahapolice.gov.in/Citizen/MH/PublishedFIRs.aspx',
headers={
'USER_AGENT': 'Mozilla/5.0',
'Referer': 'https://citizen.mahapolice.gov.in/Citizen/MH/PublishedFIRs.aspx'
},
callback=self.parse
)
def parse(self, response):
yield FormRequest.from_response(
response,
formid='form1',
formdata={
'__EVENTTARGET': response.xpath("//input[@name='__EVENTTARGET']/@value").get(),
'__EVENTARGUMENT': response.xpath("//*[@id='__EVENTARGUMENT']/@value").get(),
'__LASTFOCUS': response.xpath("//*[@id='__LASTFOCUS']/@value").get(),
'__VIEWSTATE':response.xpath("//*[@id='__VIEWSTATE']/@value").get(),
'__VIEWSTATEGENERATOR': "6F2EA376",
'__PREVIOUSPAGE': response.xpath("//*[@id='__PREVIOUSPAGE']/@value").get(),
'__EVENTVALIDATION': response.xpath("//*[@id='__EVENTVALIDATION']/@value").get(),
'ctl00$hdnSessionIdleTime': response.xpath("//*[@id='hdnSessionIdleTime']/@value").get(),
'ctl00$hdnUserUniqueId': response.xpath("//*[@id='hdnUserUniqueId']/@value").get(),
'ctl00$ContentPlaceHolder1$meeDateOfRegistrationFrom_ClientState':
response.xpath("//*[@id='ContentPlaceHolder1_meeDateOfRegistrationFrom_ClientState']/@value").get(),
'ctl00$ContentPlaceHolder1$txtDateOfRegistrationFrom': "01/07/2020",
'ctl00$ContentPlaceHolder1$meeDateOfRegistrationTo_ClientState':
response.xpath("//*[@id='ContentPlaceHolder1_meeDateOfRegistrationTo_ClientState']/@value").get(),
#'ContentPlaceHolder1_txtDateOfRegistrationTo': "03/07/2020",
'ctl00$ContentPlaceHolder1$txtDateOfRegistrationTo': "03/07/2020",
'ctl00$ContentPlaceHolder1$ddlDistrict': "19409",
'ctl00$ContentPlaceHolder1$ddlPoliceStation': "",
'ctl00$ContentPlaceHolder1$txtFirno': "",
'ctl00$ContentPlaceHolder1$btnSearch': "Search",
'ctl00$ContentPlaceHolder1$ucRecordView$ddlPageSize': "0",
'ctl00$ContentPlaceHolder1$ucGridRecordView$txtPageNumber': ""
},
callback=(self.after_login),
)
def after_login(self, response):
police_stations = response.xpath(
'//*[@id="ContentPlaceHolder1_lbltotalrecord"]/text()').get()
print(police_stations)
# --- run without project and save in `output.csv` ---
from scrapy.crawler import CrawlerProcess
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0',
})
c.crawl(abcSpider)
c.start()
EDIT: code with values which gives me result but I don't know how long values will be correct and if they will works with different dates
#!/usr/bin/env python3
import scrapy
from scrapy import FormRequest
class abcSpider(scrapy.Spider):
name = 'abc'
allowed_domains = ['citizen.mahapolice.gov.in']
def start_requests(self):
yield scrapy.Request(
url='https://citizen.mahapolice.gov.in/Citizen/MH/PublishedFIRs.aspx',
headers={
'Referer': 'https://citizen.mahapolice.gov.in/Citizen/MH/PublishedFIRs.aspx'
},
callback=self.parse
)
def parse(self, response):
yield FormRequest.from_response(
response,
formid='form1',
formdata={
'__EVENTTARGET': '',
'__EVENTARGUMENT': '',
'__LASTFOCUS': '',
'__VIEWSTATE': '/wEPDwUKLTIwNzQyOTkwOA9kFgJmD2QWAgIDD2QWIAIRDw8WAh4EVGV4dAUyPGgxPk1haGFyYXNodHJhIFBvbGljZSAtIFNlcnZpY2VzIGZvciBDaXRpemVuPC9oMT5kZAITDw8WAh8ABT88aDI+Q3JpbWUgYW5kIENyaW1pbmFsIFRyYWNraW5nIE5ldHdvcmsgYW5kIFN5c3RlbXMgKENDVE5TKTxoMj5kZAIVDw8WAh8ABSLigJxFbXBvd2VyaW5nIFBvbGljZSBUaHJvdWdoIElU4oCdZGQCFw8PFgIeCEltYWdlVXJsBRV+L0ltYWdlcy90YWJfSG9tZS5wbmcWBB4Lb25tb3VzZW92ZXIFI3RoaXMuc3JjPScuLi9JbWFnZXMvdGFiX0hvbWVSTy5wbmcnHgpvbm1vdXNlb3V0BSF0aGlzLnNyYz0nLi4vSW1hZ2VzL3RhYl9Ib21lLnBuZydkAhkPDxYCHwEFGH4vSW1hZ2VzL3RhYl9BYm91dFVzLnBuZxYEHwIFJnRoaXMuc3JjPScuLi9JbWFnZXMvdGFiX0Fib3V0VXNSTy5wbmcnHwMFJHRoaXMuc3JjPScuLi9JbWFnZXMvdGFiX0Fib3V0VXMucG5nJ2QCGw8PFgIfAQUffi9JbWFnZXMvdGFiX0NpdGl6ZW5DaGFydGVyLnBuZxYEHwIFLXRoaXMuc3JjPScuLi9JbWFnZXMvdGFiX0NpdGl6ZW5DaGFydGVyUk8ucG5nJx8DBSt0aGlzLnNyYz0nLi4vSW1hZ2VzL3RhYl9DaXRpemVuQ2hhcnRlci5wbmcnZAIdDw8WAh8BBRx+L0ltYWdlcy90YWJfQ2l0aXplbkluZm8ucG5nFgQfAgUqdGhpcy5zcmM9Jy4uL0ltYWdlcy90YWJfQ2l0aXplbkluZm9STy5wbmcnHwMFKHRoaXMuc3JjPScuLi9JbWFnZXMvdGFiX0NpdGl6ZW5JbmZvLnBuZydkAh8PDxYCHwEFKH4vSW1hZ2VzL3RhYl9PbmxpbmVTZXJ2aWNlc19FbmdfYmx1ZS5wbmcWBB8CBTJ0aGlzLnNyYz0nLi4vSW1hZ2VzL3RhYl9PbmxpbmVTZXJ2aWNlc19FbmdfUk8ucG5nJx8DBTR0aGlzLnNyYz0nLi4vSW1hZ2VzL3RhYl9PbmxpbmVTZXJ2aWNlc19FbmdfYmx1ZS5wbmcnZAIhDw8WAh8BBR9+L0ltYWdlcy90YWJfT25saW5lU2VydmljZXMucG5nFgQfAgUtdGhpcy5zcmM9Jy4uL0ltYWdlcy90YWJfT25saW5lU2VydmljZXNSTy5wbmcnHwMFK3RoaXMuc3JjPScuLi9JbWFnZXMvdGFiX09ubGluZVNlcnZpY2VzLnBuZydkAiMPZBYCAgEPZBYIAgEPZBYIAgEPZBYMAgMPDxYEHgdUb29sVGlwBRpFbnRlciBEYXRlIG9mIFJlZ2lzdHJhdGlvbh4JTWF4TGVuZ3RoZmRkAgkPFggeDERpc3BsYXlNb25leQspggFBamF4Q29udHJvbFRvb2xraXQuTWFza2VkRWRpdFNob3dTeW1ib2wsIEFqYXhDb250cm9sVG9vbGtpdCwgVmVyc2lvbj00LjEuNDA0MTIuMCwgQ3VsdHVyZT1uZXV0cmFsLCBQdWJsaWNLZXlUb2tlbj0yOGYwMWIwZTg0YjZkNTNlAB4OQWNjZXB0TmVnYXRpdmULKwQAHg5JbnB1dERpcmVjdGlvbgsphgFBamF4Q29udHJvbFRvb2xraXQuTWFza2VkRWRpdElucHV0RGlyZWN0aW9uLCBBamF4Q29udHJvbFRvb2xraXQsIFZlcnNpb249NC4xLjQwNDEyLjAsIEN1bHR1cmU9bmV1dHJhbCwgUHVibGljS2V5VG9rZW49MjhmMDFiMGU4NGI2ZDUzZQAeCkFjY2VwdEFtUG1oZAITDw8WBB8EBRpFbnRlciBEYXRlIG9mIFJlZ2lzdHJhdGlvbh8FZmRkAhkPFggfBgsrBAAfBwsrBAAfCAsrBQAfCWhkAiEPEA8WBh4ORGF0YVZhbHVlRmllbGQFC0RJU1RSSUNUX0NEHg1EYXRhVGV4dEZpZWxkBQhESVNUUklDVB4LXyFEYXRhQm91bmRnZBAVMQZTZWxlY3QKQUhNRUROQUdBUgVBS09MQQ1BTVJBVkFUSSBDSVRZDkFNUkFWQVRJIFJVUkFMD0FVUkFOR0FCQUQgQ0lUWRBBVVJBTkdBQkFEIFJVUkFMBEJFRUQIQkhBTkRBUkESQlJJSEFOIE1VTUJBSSBDSVRZCEJVTERIQU5BCkNIQU5EUkFQVVIFREhVTEUKR0FEQ0hJUk9MSQZHT05ESUEHSElOR09MSQdKQUxHQU9OBUpBTE5BCEtPTEhBUFVSBUxBVFVSC05BR1BVUiBDSVRZDE5BR1BVUiBSVVJBTAZOQU5ERUQJTkFORFVSQkFSC05BU0hJSyBDSVRZDE5BU0hJSyBSVVJBTAtOQVZJIE1VTUJBSQlPU01BTkFCQUQHUEFMR0hBUghQQVJCSEFOSRBQSU1QUkktQ0hJTkNIV0FECVBVTkUgQ0lUWQpQVU5FIFJVUkFMBlJBSUdBRBJSQUlMV0FZIEFVUkFOR0FCQUQOUkFJTFdBWSBNVU1CQUkOUkFJTFdBWSBOQUdQVVIMUkFJTFdBWSBQVU5FCVJBVE5BR0lSSQZTQU5HTEkGU0FUQVJBClNJTkRIVURVUkcMU09MQVBVUiBDSVRZDVNPTEFQVVIgUlVSQUwKVEhBTkUgQ0lUWQtUSEFORSBSVVJBTAZXQVJESEEGV0FTSElNCFlBVkFUTUFMFTEGU2VsZWN0BTE5MzcyBTE5MzczBTE5ODQyBTE5Mzc0BTE5NDA5BTE5Mzc1BTE5Mzc3BTE5Mzc2BTE5Mzc4BTE5Mzc5BTE5MzgxBTE5MzgyBTE5NDAzBTE5ODQ1BTE5ODQ2BTE5Mzg0BTE5MzgwBTE5Mzg2BTE5NDA1BTE5Mzg3BTE5Mzg4BTE5Mzg5BTE5ODQ0BTE5NDA4BTE5MzkwBTE5ODQxBTE5MzkxBTE5MzcxBTE5MzkyBTE5ODQ3BTE5MzkzBTE5Mzk0BTE5Mzg1BTE5ODQ4BTE5NDA0BTE5NDAyBTE5MzgzBTE5Mzk1BTE5Mzk2BTE5Mzk3BTE5NDA2BTE5NDEwBTE5Mzk4BTE5Mzk5BTE5NDA3BTE5NDAwBTE5ODQzBTE5NDAxFCsDMWdnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2cWAWZkAicPEGQQFQEGU2VsZWN0FQEGU2VsZWN0FCsDAWdkZAIDDw8WAh8ABQZTZWFyY2hkZAIFDw8WAh8ABQVDbGVhcmRkAgcPDxYCHwAFBUNsb3NlZGQCAw9kFgJmD2QWAgIDDxBkDxYBZhYBBQtWaWV3IFJlY29yZBYBZmQCCQ88KwARAgEQFgAWABYADBQrAABkAgsPDxYCHgdWaXNpYmxlZ2QWAgIBD2QWAgIFDw8WAh8ABQJHb2RkAiUPDxYCHwAFB1NpdGVNYXBkZAInDw8WAh8ABRRQb2xpY2UgVW5pdHMgV2Vic2l0ZWRkAikPDxYCHwAFC0Rpc2NsYWltZXJzZGQCKw8PFgIfAAUDRkFRZGQCLQ8PFgIfAAUKQ29udGFjdCBVc2RkAi8PDxYCHwAFBzgzNDkxNDhkZBgCBR5fX0NvbnRyb2xzUmVxdWlyZVBvc3RCYWNrS2V5X18WCAUNY3RsMDAkbG1nSG9tZQUMY3RsMDAkbG1nQWJ0BQ1jdGwwMCRsbWdDaHJ0BQ1jdGwwMCRsbWdJbmZvBQ5jdGwwMCRsbWdEd25sZAULY3RsMDAkbG1nT1MFM2N0bDAwJENvbnRlbnRQbGFjZUhvbGRlcjEkaW1nRGF0ZU9mUmVnaXN0cmF0aW9uRnJvbQUxY3RsMDAkQ29udGVudFBsYWNlSG9sZGVyMSRpbWdEYXRlT2ZSZWdpc3RyYXRpb25UbwUlY3RsMDAkQ29udGVudFBsYWNlSG9sZGVyMSRnZHZEZWFkQm9keQ9nZJLUBB4bd3CH8EeW9a0lIRLz9afH',
'__VIEWSTATEGENERATOR': '6F2EA376',
'__PREVIOUSPAGE': '6Fkypj_FbKCMscMOIEbFwiAIl-t4XMDVxhwkenT13SdXVANmcLkeKVNreNUcxzCFPd2Pxt-oh_2N7OVcM2YpQJ9h0re0OFqkn5XLvLpF1J-DFQ0h0',
'__EVENTVALIDATION': '/wEdAFbuJNLDGfYOJbFWhYC0CtoGCssMeMRH46lUxWxNoH/QjR5JLHBufgCBaXKcLsIHFZg2MfFCqAQ55R5q232FZgK2qoCdmcL8o03Ga7p3SNpVviXoWLdz7AIdB4qHlFb/Ei9/1ch/aUhwAcGED/suJluf7ISsvoU9AiyuaEemMV5BBJnd8M9l/EB8CbzCs/Qj58HeW1DBXpopxThMkmM3IaEA4f83zm8GjIMpdMbZJo0bg/ou0osxK9vw1/I5QAXjT4WAelg7J4xZgxz60IVmQFQVBwFQHg/XFH9pTR8T2Gs+V8qukw1XTUYPesJgPqkOxZQh262jaQ7BxUOV7QoxeNck2w47G8rm/lqu6eH38UvMjATEI1G+tctApp1T0wcXwuNCLn3Z0VPV65eVNYp7hMU8lDrezCJH7PKOMYlCjf6maxW322Wg8dLjJ0oAXaSslqZHs1bB/7i2oDFBz4DJ85TGKEfqFutX9Sc8iba6A2UA3Jbp98jppZoyKABVAKm4ScwkZSsqCnmWlHZE1g1cl5KTdz2wIx74ktDJhIyxSHIwnuUrnMnZVi7M1yfB08jysAZLiaKqyALYmaPTP4iB7/cEzRldPEjwCvWpP992wRUTSVioExXj+mq+aV3ovp1s3PYdGAfIul3shD4atfGh7x1DmI0SjJjBG5MN09bwTja6X4d1tYyTUWpH5kv7kquz0k9MSPwDuX8kZiAr7Go4LvLA1v8x//T/i3cmHZhqsqcHSaOvUIY7oYzJpYB6269Eg/Eet2MzUbATNVMVJ6z0ps0G3+QTnao16M2kNs5Amrrfs7FS6VV+1VO0VlVoBMI3MVL2a5ZuFE0VYjpXs1Ie80zilwTI+Q87lRt0RiHvm7no9Ryh+i/NQ0SvqV6XUmpTvyESyCOHyB4V0JKFy3ngQefeiU1Bhw0YKqnM8XuA/3OuCrtvoVW4iPEPfzfW0U992cke6MjKSj+bFRXox1RixVsclKaaKq2cbpV4jLUztc0v0FIrdBwoILAS39YNKuaLebG44MFbUIrfl2XY7SNDPfSk0ikZQF6oN5BHioH7XmNMzwk0vSNeQ5gKYDKx4xnue5CFyfuTFLCx43hksDtRhvJXkn2iJAzo5kx+7Oa9LqM3/7ZUva29woTjHIRcXIx5V+VUFaPbjpSXixVRCTuVCcHTNPAsoz+6EiXvsfi8lrX0f3D7YCO4ridVhQClK705rktyQAmmeO0iV/Vh5DSf8FhvD58uSORbTqGUZryylC9SPojWj+h3++zOroq6bTLe/itZW7f6vF0eyAgMysofFozRdBhZo5tdiQR7X5+feZXm8Mh9dkmrkjndCY6MJW+Z6GMDEkD2DRN460MRst3Ymkivnm8me7KLtZghplypPrBnBqKdsArB4XzeK7XbSYhMVY6qipQKdH6cU6XeeZcmTS57SquMwbHZEhKbKL6YxYuvZmZZF8nlCQZL4zlr//3g5nyOTFulzGhY80/Z1HbCJJ6LxQbS0yD9Thl7sm6WVjYxB23A6c0dbgG4R+nkAQKMqcH6ZVn78Nu9BSKKrOVmNjQwbSsS5vUv6MFDROG0CrK/eNGU0C14yGuWM5HkGE/DyCzIKRYuDsUr1CVxXS+jyCWdB8LwDiFJV7yxNt0d/PtikuBIdrCGbTGK/JJV58CVginDn/qsq9scauaAbl2FvBQQCQMuNszcsKvvFie32VnIgxjp9PYR0Y1JxT7s4XE1eEASLLIarsRVQGJxRqon8iLGYHzEO3PB1DG6typAyQ+VpxaMiZBUOTlCcsXDdY08Kwd7PKgvFhd/UCrh6PvV7qCAPcsiiHYjV/MyKFDCcqDP506hiuHs8/lYYzvu5lRwgpFGVVnV',
'ctl00$hdnSessionIdleTime': '',
'ctl00$hdnUserUniqueId': '',
'ctl00$ContentPlaceHolder1$txtDateOfRegistrationFrom': '01/07/2020',
'ctl00$ContentPlaceHolder1$meeDateOfRegistrationFrom_ClientState': '',
'ctl00$ContentPlaceHolder1$txtDateOfRegistrationTo': '03/07/2020',
'ctl00$ContentPlaceHolder1$meeDateOfRegistrationTo_ClientState': '',
'ctl00$ContentPlaceHolder1$ddlDistrict': '19372',
'ctl00$ContentPlaceHolder1$ddlPoliceStation': 'Select',
'ctl00$ContentPlaceHolder1$txtFirno': '',
'ctl00$ContentPlaceHolder1$btnSearch': 'Search',
'ctl00$ContentPlaceHolder1$ucRecordView$ddlPageSize': '0',
'ctl00$ContentPlaceHolder1$ucGridRecordView$txtPageNumber': '',
},
callback=(self.after_login),
)
def after_login(self, response):
police_stations = response.xpath(
'//*[@id="ContentPlaceHolder1_lbltotalrecord"]/text()').get()
print(police_stations)
# --- run without project and save in `output.csv` ---
from scrapy.crawler import CrawlerProcess
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0',
})
c.crawl(abcSpider)
c.start()
来源:https://stackoverflow.com/questions/62912458/typeerror-in-scrapy-spider