问题
Is it possible to get the data scraped from websites using Scrapy, and saving that data in an Microsoft SQL Server Database?
If Yes, are there any examples of this being done? Is it mainly a Python issue? i.e. if I find some code of Python saving to an SQL Server database, then Scrapy can do same?
回答1:
Yes, but you'd have to write the code to do it yourself since scrapy does not provide an item pipeline that writes to a database.
Have a read of the Item Pipeline page from the scrapy documentation which describes the process in more detail (here's a JSONWriterPipeline as an example). Basically, find some code that writes to a SQL Server database (using something like PyODBC) and you should be able to adapt that to create a custom item pipeline that outputs items directly to a SQL Server database.
回答2:
Super late and completely self promotion here, but I think this could help someone. I just wrote a little scrapy extension to save scraped items to a database. scrapy-sqlitem
It is super easy to use.
pip install scrapy_sqlitem
Define Scrapy Items using SqlAlchemy Tables
from scrapy_sqlitem import SqlItem
class MyItem(SqlItem):
sqlmodel = Table('mytable', metadata
Column('id', Integer, primary_key=True),
Column('name', String, nullable=False))
Add the following pipeline
from sqlalchemy import create_engine
class CommitSqlPipeline(object):
def __init__(self):
self.engine = create_engine("sqlite:///")
def process_item(self, item, spider):
item.commit_item(engine=self.engine)
Don't forget to add the pipeline to settings file and create the database tables if they do not exist.
http://doc.scrapy.org/en/1.0/topics/item-pipeline.html#activating-an-item-pipeline-component
http://docs.sqlalchemy.org/en/rel_1_1/core/tutorial.html#define-and-create-tables
来源:https://stackoverflow.com/questions/14741945/scrapy-python-and-sql-server