Python sqlite3 and concurrency

前端 未结 14 762
南笙
南笙 2020-11-30 18:00

I have a Python program that uses the \"threading\" module. Once every second, my program starts a new thread that fetches some data from the web, and stores this data to my

相关标签:
14条回答
  • 2020-11-30 18:49

    I like Evgeny's answer - Queues are generally the best way to implement inter-thread communication. For completeness, here are some other options:

    • Close the DB connection when the spawned threads have finished using it. This would fix your OperationalError, but opening and closing connections like this is generally a No-No, due to performance overhead.
    • Don't use child threads. If the once-per-second task is reasonably lightweight, you could get away with doing the fetch and store, then sleeping until the right moment. This is undesirable as fetch and store operations could take >1sec, and you lose the benefit of multiplexed resources you have with a multi-threaded approach.
    0 讨论(0)
  • 2020-11-30 18:51

    Switch to multiprocessing. It is much better, scales well, can go beyond the use of multiple cores by using multiple CPUs, and the interface is the same as using python threading module.

    Or, as Ali suggested, just use SQLAlchemy's thread pooling mechanism. It will handle everything for you automatically and has many extra features, just to quote some of them:

    1. SQLAlchemy includes dialects for SQLite, Postgres, MySQL, Oracle, MS-SQL, Firebird, MaxDB, MS Access, Sybase and Informix; IBM has also released a DB2 driver. So you don't have to rewrite your application if you decide to move away from SQLite.
    2. The Unit Of Work system, a central part of SQLAlchemy's Object Relational Mapper (ORM), organizes pending create/insert/update/delete operations into queues and flushes them all in one batch. To accomplish this it performs a topological "dependency sort" of all modified items in the queue so as to honor foreign key constraints, and groups redundant statements together where they can sometimes be batched even further. This produces the maxiumum efficiency and transaction safety, and minimizes chances of deadlocks.
    0 讨论(0)
  • 2020-11-30 18:51

    Use threading.Lock()

    0 讨论(0)
  • 2020-11-30 18:51

    I could not find any benchmarks in any of the above answers so I wrote a test to benchmark everything.

    I tried 3 approaches

    1. Reading and writing sequentially from the SQLite database
    2. Using a ThreadPoolExecutor to read/write
    3. Using a ProcessPoolExecutor to read/write

    The results and takeaways from the benchmark are as follows

    1. Sequential reads/sequential writes work the best
    2. If you must process in parallel, use the ProcessPoolExecutor to read in parallel
    3. Do not perform any writes either using the ThreadPoolExecutor or using the ProcessPoolExecutor as you will run into database locked errors and you will have to retry inserting the chunk again

    You can find the code and complete solution for the benchmarks in my SO answer HERE Hope that helps!

    0 讨论(0)
  • 2020-11-30 18:54

    Or if you are lazy, like me, you can use SQLAlchemy. It will handle the threading for you, (using thread local, and some connection pooling) and the way it does it is even configurable.

    For added bonus, if/when you realise/decide that using Sqlite for any concurrent application is going to be a disaster, you won't have to change your code to use MySQL, or Postgres, or anything else. You can just switch over.

    0 讨论(0)
  • 2020-11-30 18:54

    You need to design the concurrency for your program. SQLite has clear limitations and you need to obey them, see the FAQ (also the following question).

    0 讨论(0)
提交回复
热议问题