gil

Parallel file matching, Python

醉酒当歌 提交于 2019-11-28 20:58:22
I am trying to improve on a script which scans files for malicious code. We have a list of regex patterns in a file, one pattern on each line. These regex are for grep as our current implementation is basically a bash script find\grep combo. The bash script takes 358 seconds on my benchmark directory. I was able to write a python script that did this in 72 seconds but want to improve more. First I will post the base-code and then tweaks I have tried: import os, sys, Queue, threading, re fileList = [] rootDir = sys.argv[1] class Recurser(threading.Thread): def __init__(self, queue, dir): self

Green-threads and thread in Python

风流意气都作罢 提交于 2019-11-28 16:40:59
As Wikipedia states : Green threads emulate multi-threaded environments without relying on any native OS capabilities, and they are managed in user space instead of kernel space, enabling them to work in environments that do not have native thread support. Python's threads are implemented as pthreads (kernel threads) , and because of the global interpreter lock (GIL), a Python process only runs one thread at a time. [ QUESTION ] But in the case of Green-threads (or so-called greenlet or tasklets), Does the GIL affect them? Can there be more than one greenlet running at a time? What are the

400 threads in 20 processes outperform 400 threads in 4 processes while performing an I/O-bound task

三世轮回 提交于 2019-11-28 10:27:07
Experimental Code Here is the experimental code that can launch a specified number of worker processes and then launch a specified number of worker threads within each process and perform the task of fetching URLs: import multiprocessing import sys import time import threading import urllib.request def main(): processes = int(sys.argv[1]) threads = int(sys.argv[2]) urls = int(sys.argv[3]) # Start process workers. in_q = multiprocessing.Queue() process_workers = [] for _ in range(processes): w = multiprocessing.Process(target=process_worker, args=(threads, in_q)) w.start() process_workers

python多线程+GIL

╄→尐↘猪︶ㄣ 提交于 2019-11-28 07:29:04
---恢复内容开始--- python的多线程实际上只有一个线程。 了让各个线程能够平均利用CPU时间,python会计算当前已执行的微代码数量,达到一定阈值后就强制释放GIL。而这时也会触发一次操作系统的线程调度(当然是否真正进行上下文切换由操作系统自主决定)。 GIL全局解释器锁: 保证同一时间只有一个线程得到数据并且只有一个线程执行,但是cpu调度时间到了以后,第一个线程无论是否完成均程等待状态(若未执行完毕,数据放入寄存器中)下一线程得到的依旧是原始的公共数据。 用户级lock:保证同一时间只有一个线程在修改数据。(可以避免几个线程同时对公共原始数据进行修改,提高线程效率) 为公共数据Lock:一个线程修改后,释放,下一进程才可再次进行修改。 RLock(递归锁): 在一个大锁中还要再包含子锁 1 import threading, time 2 3 4 def run1(): 5 print("grab the first part data") 6 lock.acquire() 7 global num 8 num += 1 9 lock.release() 10 return num 11 12 13 def run2(): 14 print("grab the second part data") 15 lock.acquire() 16 global num2

你是否真的了解全局解析锁(GIL)

杀马特。学长 韩版系。学妹 提交于 2019-11-28 06:15:31
关于我 一个有思想的程序猿,终身学习实践者,目前在一个创业团队任team lead,技术栈涉及Android、Python、Java和Go,这个也是我们团队的主要技术栈。 Github: https://github.com/hylinux1024 微信公众号:终身开发者(angrycode) 0x00 什么是全局解析锁(GIL) A global interpreter lock (GIL) is a mechanism used in computer-language interpreters to synchronize the execution of threads so that only one native thread can execute at a time. -- 引用自wikipedia 从上面的定义可以看出, GIL 是计算机语言解析器用于同步线程执行的一种 同步锁机制 。很多编程语言都有 GIL ,例如 Python 、 Ruby 。 0x01 为什么会有GIL Python 作为一种面向对象的动态类型编程语言,开发者编写的代码是通过解析器顺序解析执行的。 大多数人目前使用的 Python 解析器是 CPython 提供的,而 CPython 的解析器是 使用引用计数来进行内存管理 ,为了对多线程安全的支持,引用了 global intepreter

Understanding python GIL - I/O bound vs CPU bound

 ̄綄美尐妖づ 提交于 2019-11-28 05:55:30
问题 From python threading documentation In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously. Now I have a thread worker

GIL全局解释器锁

情到浓时终转凉″ 提交于 2019-11-28 01:07:14
GIL全局解释器锁 python解释器有很多种,最常见的就是CPython解释器 GIL本质也是一把互斥锁:将并发变成串行牺牲效率保证数据的安全 用来阻止用一进程地下的多线程的同时执行(同一进程内多个线程无法实现并行但是可以实现并发) GIL的存在是因为CPython解释器的内存管理不是线程安全的 垃圾回收机制:引用计数,标记清除,分代回收 研究python的多线程是否有用需要分情况谈论 比如我们要开四个任务,都是计算密集型的,每个需要用时10s 在单核情况下,开线程比开进程更节省资源 在多核情况下,开进程比开线程更节省资源,比如开进程需要10s,而开线程需要40s。 # 计算密集型 在多核情况下,开线程比开线程更节省时间资源 from multiprocessing import Process from threading import Thread import os,time def work(): res = 0 for i in range(100000000): res *= i if __name__ == '__main__': l = [] print(os.cpu_count()) start = time.time() for i in range(4): p = Process(target=work) # run time is 24

GIL锁 与线程进程池

拟墨画扇 提交于 2019-11-27 21:22:35
GIL GIL锁: 全局解释器锁. Cpython特有的一把互斥锁,自动加锁解锁 将并发变成串行同一时刻同一进程中只有一个线程被执行使用共享资源 ,牺牲效率,保证数据安全. 同一个进程中的多个线程只能有一个线程真正被cpu执行 设置全局解释锁: GIL 保证解释器里面的数据安全. 当时开发py语言时,只有单核, 强行加锁: 减轻了你开发的人员的负担. 单个进程多线程不能利用多核,为什么不去掉? 不能去掉,源码太多,改不动了 不能用多核会影响效率吗 看处理数据情况 我们有四个任务需要处理,处理方式肯定是要玩出并发的效果,解决方案可以是: 方案一:开启四个进程 方案二:一个进程下,开启四个线程 #单核情况下,分析结果:   如果四个任务是计算密集型,没有多核来并行计算,方案一徒增了创建进程的开销,方案二胜   如果四个任务是I/O密集型,方案一创建进程的开销大,且进程的切换速度远不如线程,方案二胜 #多核情况下,分析结果:   如果四个任务是计算密集型,多核意味着并行计算,在python中一个进程中同一时刻只有一个线程执行用不上多核,方案一胜   如果四个任务是I/O密集型,再多的核也解决不了I/O问题,方案二胜 #结论:现在的计算机基本上都是多核,python对于计算密集型的任务开多线程的效率并不能带来多大性能上的提升,甚至不如串行(没有大量切换),但是

Why is there no GIL in the Java Virtual Machine? Why does Python need one so bad?

自闭症网瘾萝莉.ら 提交于 2019-11-27 16:44:00
I'm hoping someone can provide some insight as to what's fundamentally different about the Java Virtual Machine that allows it to implement threads nicely without the need for a Global Interpreter Lock (GIL), while Python necessitates such an evil. Python (the language) doesn't need a GIL (which is why it can perfectly be implemented on JVM [Jython] and .NET [IronPython], and those implementations multithread freely). CPython (the popular implementation) has always used a GIL for ease of coding (esp. the coding of the garbage collection mechanisms) and of integration of non-thread-safe C-coded