Using Scrapy with amazon S3 is fairly simple, you set:
- FEED_URI = 's3://MYBUCKET/feeds/%(name)s/%(time)s.jl'
- FEED_FORMAT = 'jsonlines'
- AWS_ACCESS_KEY_ID = [access key]
- AWS_SECRET_ACCESS_KEY = [secret key]
and everything works just fine.
But Scrapyd seems to override that setting and saves the items on the server (with a link in the web site)
Adding the "items_dir =" setting doesn't seem to change anything.
What kind of setting makes it work?
EDIT: Extra info that might be relevant - we are using Scrapy-Heroku.
I also faced the same problem. Removing the items_dir= from scrapyd.conf file worked for me.
You can set the items_dir
property to an empty value like this:
[scrapyd]
items_dir=
It seems that when that property is set, takes precedence over the configured exported. See http://scrapyd.readthedocs.org/en/latest/config.html for more information.
来源:https://stackoverflow.com/questions/15955723/saving-items-from-scrapyd-to-amazon-s3-using-feed-exporter