How can I get an output in UTF-8 encoded unicode from Scrapy?

后端未结

关注

 2  836

南旧 2021-01-19 10:22

Bear with me. I\'m writing every detail because so many parts of the toolchain do not handle Unicode gracefully and it\'s not clear what is failing.

PRELUDE<

2条回答

北海茫月 (楼主)

2021-01-19 11:17

please try this on your Attempt 1 and let me know if it works (I've test it without setting all those env. variables)

def to_write(uni_str): return urllib.unquote(uni_str.encode('utf8')).decode('utf8') class CitiesSpider(scrapy.Spider): name = "cities" allowed_domains = ["sitercity.info"] start_urls = ( 'http://en.sistercity.info/sister-cities/Düsseldorf.html', ) def parse(self, response): for i in range(2): item = SimpleItem() item['title'] = to_write(response.xpath('//title').extract_first()) item['url'] = to_write(response.url) yield item

the range(2) is for testing the json exporter, to get a list of dicts you can do this instead:

# -*- coding: utf-8 -*- from scrapy.contrib.exporter import JsonItemExporter from scrapy.utils.serialize import ScrapyJSONEncoder class UnicodeJsonLinesItemExporter(JsonItemExporter): def __init__(self, file, **kwargs): self._configure(kwargs, dont_fail=True) self.file = file self.encoder = ScrapyJSONEncoder(ensure_ascii=False, **kwargs) self.first_item = True

0 讨论(0)

查看其它2个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复