问题
Scrapy seems to be pulling the data out correctly, but is formatting the output in my JSON object as if it were an array:
[{"price": ["$34"], "link": ["/product/product..."], "name": ["productname"]},
{"price": ["$37"], "link": ["/product/product"]...
My spider class looks like this:
def parse(self, response):
sel = Selector(response)
items = sel.select('//div/ul[@class="product"]')
skateboards = []
for item in items:
skateboard = SkateboardItem()
skateboard['name'] = item.xpath('li[@class="desc"]//text()').extract()
skateboard['price'] = item.xpath('li[@class="price"]"]//text()[1]').extract()
skateboard['link'] = item.xpath('li[@class="image"]').extract()
skateboards.append(skateboard)
return skateboards
How would I go about ensuring that Scrapy is only outputting a single value for each key, rather than the array it's currently producing?
回答1:
.extract()
always returns a list you can use
''.join(item.xpath('li[@class="desc"]//text()').extract())
to get a string
回答2:
Use:
1 .extract_first() or
2 .extract()[0]
to get data in string format.
PS: using Scrapy 1.2
来源:https://stackoverflow.com/questions/23490643/scrapy-returning-scraped-values-into-an-array