一. 目录
1.进程的概念和两种创建方式
2.多进程爬虫
3.守护进程
4.进程队列
5.进程队列简单应用(实现数据共享)
6.线程的两种创建方式
7.线程和进程的效率对比
8.线程共享统一进程的数据
9.死锁现象
10.线程队列的三种应用
11.多线程执行计算密集型任务
12. 线程池和进程池
13. 回调函数
14.守护线程
15. 协程
16.GlL 全局解释器锁
二. 内容
一.进程的概念和两种创建方式
专业词描述:
操作系统的两大作用 1.把硬件丑陋复杂的接口隐藏起来,为应用程序提供良好的接口 2.管理,调度进程,并且把进程之间对硬件的竞争变的有序化多道技术: 1.产生背景:为了实现单cpu下的并发效果 2.分为两个部分 1.空间上的复用(必须实现硬件层面的隔离) 2.时间上的复用(复用的是cpu的时间片) 什么时候切换? 1.正在执行的任务遇到阻塞 2.正在执行的任务运行时间过程(系统控制的)进程:正在运行的一个过程,一个任务,由操作系统负责调度,由cpu负责 执行程序:程序员写的代码并发:伪并行,单核+多道并行:只有多核才能实现真正的并行同步:一个进程在执行某个任务时,另外一个进程必须等待其执行完毕才能往下走异步:一个进程在执行某个任务时,另外一个进程无须等待其执行完毕,继续往下走进程的创建: 1.系统初始化 2.与用户交互 3.执行一个进程的过程中的调用 4.批处理任务系统的调用 1. linux:fork 2.window:CreateProcesslinux下的进程与windows的区别: 1: linux的进程有父子关系,是一种树形结构,是父子关系,windows没这种关系 2:Linux创建新的进程需要copy父进程的地址空间,winx下最开始创建进程,两个进程之间不一样。
进程的概述:进程是正在执行的程序的实例,是操作系统动态执行的基本单元。进程是一个实体,每一个进程都都有自己的地址空间,一般包括文本区域(python的文件) 、数据区域(python文件中的一些变量数据)和堆栈。文本区域存储处理执行的代码,数据区域存储变量的进程执行期间使用的动态分配内存。堆栈区域存储活动过程调用的指定和本地变量。
进程的终止:
1.正常退出
2.出错退出
3.严重错误
4.被其他程杀死
在windows中只有句柄的概念
进程的三种状态:就绪 运行 阻塞
进程并发的实现:进程表里面会记录程序上次执行的状态,一遍下次执行的时候接着执行。
开启多进程方法一:
import osimport timeimport osimport randomfrom multiprocessing import Processprint(os.cpu_count()) #查看有几个cpudef func(): print("func funcation") time.sleep(random.randint(1,3))if __name__ == '__main__': f = Process(target=func,name="p2") #指定进程名字 f.start() #告诉系统我要创建一个子进程 print("f name is %s" %f.name) #m默认从process -1开始f name is Process-1,可以自己指定 print("主进程") # 进程要等到子进程执行完才能结束,否则子进程就变成僵尸进程了方法二:通过定义类继承process实现多进程
#方法2from multiprocessing import Processimport timeimport osimport randomclass Myprocess(Process): def __init__(self,func): super().__init__() self.func = func def run(self): self.func()def func1(): print("子进程1测试") print("子进程1pid",os.getpid())def func2(): print("子进程2测试") print("子进程2pid2",os.getpid())if __name__ == '__main__': p1 = Myprocess(func1) p2 = Myprocess(func2) p1.start() #调用子进程中的run方法 p2.start() print("主进程pid",os.getpid())
join方法:把父进程卡住,等待子进程结束才执行父进程
import timefrom multiprocessing import Processdef func(name): time.sleep(3) print("%s is writing" %name)if __name__ == '__main__': p1 = Process(target=func,args=("ivy",)) p2 = Process(target=func,args=("zoe",)) p3 = Process(target=func, args=("zoe",)) # p1.start() # p2.start() #主进程发起创建子进程的请求,由操作系统来创建。 # p1.join() #卡着等子进程结束,卡的是主进程,子进程一直在后台运行 # p2.join() p_1 = [p1,p2,p3] for p in p_1: p.start() for p in p_1: p.join() print("主进程")进程的常见方法及其说明
import timeimport osfrom multiprocessing import Processdef func(name): time.sleep(3) print("%s is writing" %name)if __name__ == '__main__': p1 = Process(target=func,args=("ivy",)) p1.daemon = True #主进程运行完毕,子进程就回收了 p1.start() print(p1.name) #打印进程名字 print(os.getpid()) #查看当前进程id print(os.getppid()) #查看主进程id p1.terminate() #杀进程 print(p1.is_alive()) #查看进程是否存活 print("主进程")基于多进程实现socket通信服务端:
import socketfrom multiprocessing import Processserver = socket.socket(socket.AF_INET,type=socket.SOCK_STREAM)server.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)server.bind(("127.0.0.1",8080))server.listen(5)def talk(conn,addr): while True: try: msg = conn.recv(1024) if not msg:break conn.send(msg.upper()) except Exception: breakif __name__ == '__main__': while True: conn,addr = server.accept() p = Process(target=talk,args=(conn,addr)) p.start()
客户端:
import socketclient = socket.socket()client.connect(("127.0.0.1",8080))while True: msg = input("客户端说:") client.send(msg.encode("utf-8")) msg_server = client.recv(1024) print(msg_server.decode("utf-8"))
二.多进程爬虫
import requestsimport timeimport osfrom multiprocessing import Processurls = ["http://p1.music.126.net/EAJfo8I22hDJErMR7WyOUQ==/109951162860207008.jpg", "http://p0.qhimgs4.com/t01ba9168ef323dfc7a.jpg", "http://m.iqiyipic.com/u7/image/20181107/b3/98/uv_20036427021_m_601_720_405.jpg"]def download(url,i): time.sleep(1) url = requests.get(url) new_url = url.content with open("image%s.jpg" %(i),mode="wb") as f: f.write(new_url) print(os.getpid())if __name__ == '__main__': start_time = time.time() p_l = [] for i,url in enumerate(urls): p = Process(target=download,args=(url,i+1)) p_l.append(p) p.start() [p.join() for p in p_l] print("主进程") end_time = time.time() print("执行时间",end_time- start_time)
三.守护进程
守护进程把xx设置成守护进程,当主进程结束后xx也结束
把f1设置成守护进程 沉睡一秒,此时主进程已经结束 f1也就跟着结束了。
守护进程不能再开子进程
import timefrom multiprocessing import Processdef func1(): time.sleep(1) print("我是func1")def func2(): print("我是func2")if __name__ == '__main__': f1 = Process(target=func1) f2 = Process(target=func2) f1.daemon = True f1.start() f2.start() f2.join() print("我是主进程")
四.进程队列
进程与进程之间的通信需要IPC进制来实现,进程之间通信一般有两种方式,管道和队列,而队列就是基于管道和锁来实现的。加锁的弊端相当于进程变成串行的形式运行,降到了执行效率。优势是保证了数据不错乱。队列的特点是先进先出
队列常用方法:
from multiprocessing import Process,Queueq = Queue(5) #里面可以传值,默认代表无限大,队列先进先出 堆栈:先进后出q.put("hello")q.put("world")q.put("hello world")#q.put("d",False) #代表队列满了就不能往里面放了等同于 nowaitq.put("d",timeout=2) #代表等两秒#ps 也可以放对象print(q.get())print(q.get())print(q.get())print(q.get(block=False)) #gen put一样print(q.full()) #判断是否已满print(q.empty()) #判断是否为空print(q.qsize()) #判断大小
五.进程通信的三种方式
1.IPC队列简单应用(实现数据共享之生产者消费者模型)
2.基于文件
3.Manages模块
虽然进程之间是相互隔离的,但是进程是共享一套操作系统和文件
from multiprocessing import Processdef work(filename,msg): with open(filename,mode="a",encoding="utf-8") as f: f.write(msg) f.write("\n")if __name__ == '__main__': for i in range(5): p = Process(target=work,args=("a,txt","进程%s" %str(i))) p.start()第一种通信方式基于IPC的Queue模块,生产者消费者模型:例子1:
#生产者消费者模型:为了平衡消费者和生产者的数据,两个进程互不打扰,互相不影响对方。import timeimport randomfrom multiprocessing import Process,Queuedef consumer(q,name): while True: time.sleep(random.randint(1,3)) ret = q.get() print("\033[41m消费者%s拿到了%s\033[0m" %(name,ret))def producer(seq,q,name): for item in seq: time.sleep(random.randint(1, 3)) q.put(item) print('\033[42m生产者%s生产了%s\033[0m' %(name,item))if __name__ == '__main__': q = Queue() c = Process(target=consumer,args=(q,"ivy")) c.start() seq = ["包子%s" %i for i in range(10)] producer(seq,q,"厨师1") #主进程充当生产者 print("主进程")
例2:基于不同子进程做生产者和消费者,如果生产者队列为空则退出
import timeimport randomfrom multiprocessing import Process,Queuedef consumer(q,name): while True: time.sleep(random.randint(1,3)) ret = q.get() if ret is None:break print("\033[41m消费者%s拿到了%s\033[0m" %(name,ret))def producer(seq,q,name): for item in seq: time.sleep(random.randint(1, 3)) q.put(item) print('\033[42m生产者%s生产了%s\033[0m' %(name,item)) q.put(None)if __name__ == '__main__': q = Queue() c = Process(target=consumer,args=(q,"ivy")) c.start() seq = ["包子%s" %i for i in range(10)] p = Process(target=producer,args=(seq,q,"厨师1")) p.start() print("主进程")
例3:基于JoinableQueue模块和守护进程实现队列生产者生产一个 消费者消费一个
import timeimport randomfrom multiprocessing import Process,JoinableQueuedef consumer(q,name): while True: time.sleep(random.randint(1,3)) ret = q.get() q.task_done() # if ret is None:break print("\033[41m消费者%s拿到了%s\033[0m" %(name,ret))def producer(seq,q,name): for item in seq: time.sleep(random.randint(1, 3)) q.put(item) print('\033[42m生产者%s生产了%s\033[0m' %(name,item)) q.join() print("+++++++++++++++>>>")if __name__ == '__main__': q = JoinableQueue() c = Process(target=consumer,args=(q,"ivy")) c.daemon = True #设置守护进程,主进程结束c就结束 c.start() seq = ["包子%s" %i for i in range(10)] p = Process(target=producer,args=(seq,q,"厨师1")) p.start() p.join() #主进程等待p,p等待c把数据去完,c一旦取完数据p.join就不在阻塞 #而主进程结束,主进程结束会回收守护进程c,而且c此时也没有存在的必要 print("主进程")第二种中通讯方式基于Manage模块
例1:进程同步多个进程之间一起修改数据
from multiprocessing import Manager,Processimport osdef work(d,lst): lst.append(os.getpid()) d[os.getpid()] = os.getpid()if __name__ == '__main__': m = Manager() lst = m.list(["init"]) d = m.dict({"name":"Ivy"}) p_1 = [] for i in range(5): p = Process(target=work,args=(d,lst)) p_1.append(p) p.start() [p.join() for p in p_1] print(d) print(lst)基于Manage做数据共享
from multiprocessing import Process,Manager,Lockdef work(d,lock): with lock: d["count"] -=1if __name__ == '__main__': lock = Lock() m = Manager() d = m.dict({'count':100}) p_l= [] for i in range(100): p = Process(target=work,args=(d,lock)) p_l.append(p) p.start() [p.join() for p in p_l] print('主进程',d)
六.线程的两种创建方式
线程的概念:一个进程里面执行有一个控制线程,线程是cpu的执行单位,进程只是把一堆资源结合在一起。真正在cpu上调度的是进程里面的线程。多线程是一个进程里面有多个进程。
为什么要用多线程:因为开启进程的时候需要划分地址空间,在这个过程中耗时长。多个活共享一个资源的时候推荐使用多线程,线程比进程更轻量。线程用的是一个进程里面的资源,创建过程比较快。IO密集的时候多线程的优势比较明显,对于cpu密集型多线程并不能体现效果。
python的多线程用不了多核
线程:一条流水线的执行过程是一个线程,一条流水线必须属于一个车间。一个车间的运行过程就是一个进程一个进程内至少有一个线程,进程是一个资源单位,线程才是cpu的执行单位多线程:一个车间内有多条流水线,多个流水线共享该车间的资源(多线程共享一个进程的资源)线程的开销远远小于进程,为什么是要使用多线程: 1.共享资源 2.创建开销小创建方法一:
from threading import Threaddef work(name): print("%s say hello" %name)if __name__ == '__main__': t = Thread(target=work,args=("Ivy",)) t.start() print("主线程")
创建方法二:
from threading import Threadclass Work(Thread): def __init__(self,name): super().__init__() self.name = name def run(self): print("%s say hello" %self.name)if __name__ == '__main__': t = Work("Ivy") t.start()基于多线程写socketserver端
from socket import *from threading import Threaddef server(ip,port): s = socket(AF_INET, SOCK_STREAM) s.bind((ip,port)) s.listen(5) while True: conn,addr = s.accept() print("client",addr) t = Thread(target=talk,args=(conn,addr)) t.start()def talk(conn,addr): try: while True: res = conn.recv(1024) if not res:break print("client %s:%s msg:%s" %(addr[0],addr[1],res)) conn.send(res.upper()) except Exception: pass finally: conn.close()if __name__ == '__main__': server("127.0.0.1",8080)client端
from socket import *c = socket()c.connect(("127.0.0.1",8080))while True: msg = input(">>: ").strip() if not msg:continue c.send(msg.encode("utf-8")) res = c.recv(1024) print("from server msg:" ,res.decode("utf-8"))线程的常用方法:
import timeimport threadingfrom threading import Threaddef work(): time.sleep(2) print("%s say hello" %threading.current_thread().getName())if __name__ == '__main__': t = Thread(target=work) # #t.daemon = True # t.setDaemon(True) t.start() print(threading.enumerate()) #查看当前活跃的进程是一个列表 print(threading.active_count()) #当前活跃的线程数 print("主进程",threading.current_thread().getName())
基于多线程实现对文件的格式化保存
from threading import Threadmsg_l=[]format_l = []def talk(): while True: msg = input(">>: ").strip() if not msg:continue msg_l.append(msg)def format(): while True: if msg_l: res = msg_l.pop() res = res.upper() format_l.append(res)def save(): while True: if format_l: res = format_l.pop() with open("db.txt","a",encoding="utf-8") as f: f.write("%s\n"%res)if __name__ == '__main__': t1 = Thread(target=talk) t2 = Thread(target=format) t3 = Thread(target=save) t1.start() t2.start() t3.start()
七.线程和进程的效率对比
线程与进程的区别:
线程共享创建他进程的地址空间,线程可以直接访问里面的数据,线程可以跟他进程里面的线程通信。进程和进程通讯必须使用IPC。线程的创建开启小,主线程可直接控制子线程。进程只能控制子进程,改变子进程不能影响父进程。
python解释器的进程是直接调用操作系统的系统。属于内核级别的进程。
八.线程共享同一进程的数据
可以通过事件实现数据共享
from threading import Event,Threadimport threadingimport timedef conn_mysql(): print("%s waiting....."%threading.current_thread().getName()) e.wait() print("%s start to connect mysql...." % threading.current_thread().getName()) time.sleep(2)def check_mysql(): print("%s checking....." % threading.current_thread().getName()) time.sleep(4) e.set()if __name__ == '__main__': e = Event() c1 = Thread(target=conn_mysql) c2 = Thread(target=conn_mysql) c3 = Thread(target=conn_mysql) c4 = Thread(target=check_mysql) c1.start() c2.start() c3.start() c4.start()
九.加锁和解决死锁现象(互斥锁和递归锁)
死锁案列:
from threading import Thread,Lockimport timeclass MyThread(Thread): def run(self): self.f1() self.f2() def f1(self): mutaxA.acquire() print("\033[46m%s拿到A锁\033[0m" %self.name) mutaxB.acquire() print("\033[43m%s拿到B锁\033[0m" % self.name) mutaxB.release() mutaxA.release() def f2(self): mutaxB.acquire() time.sleep(1) print("\033[43m%s拿到B锁\033[0m" % self.name) mutaxA.acquire() print("\033[42m%s拿到A锁\033[0m" % self.name) mutaxA.release() mutaxB.release()if __name__ == '__main__': mutaxA = Lock() mutaxB = Lock() # t = MyThread() # t.start() for i in range(20): t = MyThread() t.start()基于递归锁来解决:递归锁里面使用的是计算器,遇到锁的时候加1,释放锁减1,只有等到计数器数字为1的时候别人才能拿到锁。
from threading import Thread,Lock,RLockimport timeclass MyThread(Thread): def run(self): self.f1() self.f2() def f1(self): mutaxA.acquire() print("\033[46m%s拿到A锁\033[0m" %self.name) mutaxB.acquire() print("\033[43m%s拿到B锁\033[0m" % self.name) mutaxB.release() mutaxA.release() def f2(self): mutaxB.acquire() time.sleep(1) print("\033[43m%s拿到B锁\033[0m" % self.name) mutaxA.acquire() print("\033[42m%s拿到A锁\033[0m" % self.name) mutaxA.release() mutaxB.release()if __name__ == '__main__': mutaxA = mutaxB = RLock() # mutaxA = Lock() # mutaxB = Lock() # t = MyThread() # t.start() for i in range(20): t = MyThread() t.start()信号量锁:相当于同一时间有几个人可以拿锁
from threading import Thread,Semaphoreimport timedef work(id): with sem: time.sleep(2) print("%s say hello" %id)if __name__ == '__main__': sem = Semaphore(5) for i in range(20): t = Thread(target=work,args=(1,)) t.start()
例1:以抢票为例加锁
from multiprocessing import Process,Lockimport jsonimport timeimport osimport randomdef work(dbfile,name,lock): lock.acquire() with open(dbfile,encoding="utf-8") as f: dic = json.loads(f.read()) if dic["count"] > 0: dic["count"] -=1 time.sleep(random.randint(1,3)) with open(dbfile,"w",encoding="utf-8") as f: f.write(json.dumps(dic)) print("\033[43m%s抢票成功\033[0m" %name) else: print("\033[45m%s 抢票失败\033[0m" %name) lock.release()if __name__ == '__main__': lock = Lock() p_l = [] for i in range(100): p = Process(target=work,args=("a.txt","用户%s" %i,lock)) p_l.append(p) p.start() [p.join() for p in p_l] print("主进程")例2:加锁第二种写法上下文管理with
from multiprocessing import Process,Lockimport jsonimport timeimport osimport randomdef work(dbfile,name,lock): # lock.acquire() with lock: with open(dbfile,encoding="utf-8") as f: dic = json.loads(f.read()) if dic["count"] > 0: dic["count"] -=1 time.sleep(random.randint(1,3)) with open(dbfile,"w",encoding="utf-8") as f: f.write(json.dumps(dic)) print("\033[43m%s抢票成功\033[0m" %name) else: print("\033[45m%s 抢票失败\033[0m" %name) # lock.release()if __name__ == '__main__': lock = Lock() p_l = [] for i in range(100): p = Process(target=work,args=("a.txt","用户%s" %i,lock)) p_l.append(p) p.start() [p.join() for p in p_l]解决死锁把Lock 换成RLock即可
十.线程队列的三种应用
第一种:先进先出
import queueq = queue.Queue(5)q.put("hello")q.put("world")q.put("hello world")q.put_nowait("hey")print(q.qsize())print(q.get())print(q.get())print(q.get())print(q.empty())print(q.full())print(q.get_nowait())第二种先进后出
import queueq = queue.LifoQueue(5)q.put("a")q.put("b")q.put_nowait("c")print(q.get())print(q.get())print(q.get())第三种指定优先级只能指定元组或者列表的形式,数字越小,优先级最大
import queueq = queue.PriorityQueue(5)q.put((1,"c"))q.put((2,"a"))q.put((3,"b"))print(q.get())print(q.get())print(q.get())
十一.多线程执行计算密集型任务
对于IO密集型来说,使用多进程没有,对于计算密集行使用多进程比较占优势。
一个cpu在同一时间只能处理一个进程里面的线程。原因跟GIL相关。
十二. 线程池和进程池
进程池:一般开进程可参考cpu的核数。
import osimport timeimport randomfrom concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutordef func(n): time.sleep(random.randint(1,3)) return n*nif __name__ == '__main__': pool = ProcessPoolExecutor(max_workers=5) p_lst = [] for i in range(10): ret = pool.submit(func,i) #异步提交任务,func是函数名,i是func函数的参数 p_lst.append(ret) # pool.shutdown() #锁定线程池,不让新任务再提交进来了.轻易不用 #[i.result() for i in p_lst] for i in p_lst: print(i.result()) #有join的效果
十三. 回调函数
回调函数方法1:使用Pool模块
import osfrom multiprocessing import Pool,Processdef work(n): return n*nif __name__ == '__main__': pool = Pool(5) res_l = [] for i in range(6): res = pool.apply_async(work,args=(i,)) res_l.append(res) for res in res_l: print(res.get()) ps:我也不太明白回调函数2:基于模块实现
#把一个任务的执行结果给另外一个函数去处理,应用场景爬虫from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutordef func1(x,y): return x+ydef func2(n): print(n) print(n.result())if __name__ == '__main__': pool = ProcessPoolExecutor(max_workers=5,) pool.submit(func1,5,10).add_done_callback(func2)ps:如果要用线程,创建对象的时候把ProcessPoolExecutor换成ThreadPoolExecutor
十四. 守护线程
守护线程
import timeimport threadingfrom threading import Threaddef work(): time.sleep(2) print("say hello")if __name__ == '__main__': t = Thread(target=work) #t.daemon = True t.setDaemon(True) t.start() print("主进程")
十五. 协程
单线程下的并发,协程 是一种用户态的轻量级线程python的线程属于内核级别的协程是单线程下的并发,当遇到io是自动切换到别的协程,必须在一个单线程下实现并发,不需要加锁,本质是是串行运行。只是切换速度很快。要实现协程,主要用户自己控制切换,保存状态。
yield实现两个程序之间快速切换的例子
import timedef consumer(): #print(item) x = 2222222222222 y=33333333333333 a = "aaaaaaaaaaaaaaaaaaaa" b = "ccccccccccccccc" while True: item = yielddef producere(target,seq): for item in seq: target.send(item)g=consumer()next(g)start_time = time.time()producere(g,range(100000))stop_time = time.time()print("运行时间",stop_time-start_time)
greenlet模块的switch方法切换
from greenlet import greenletdef test1(): print("test1,first") gr2.switch() print("test1,second") gr2.switch()def test2(): print("test2,first") gr1.switch() print("test2,second")gr1 = greenlet(test1)gr2 = greenlet(test2)gr1.switch()
gevent实现协程
import geventdef eat(name): print("%s eat food first" %name) gevent.sleep(5) print("%s eat food second" % name)def play(name): print("%s play phone 1" %name) gevent.sleep(10) print("%s play phone 1" % name)g1 = gevent.spawn(eat,"ivy")g2 = gevent.spawn(play,"zoe")g1.join()g2.join()print("主")完整版的gevent打添丁实现的,如果不打补丁不会识别time的sleep方法
from gevent import monkey;monkey.patch_all()import geventimport timedef eat(name): print("%s eat food first" %name) time.sleep(5) print("%s eat food second" % name)def play(name): print("%s play phone 1" %name) time.sleep(10) print("%s play phone 1" % name)g1 = gevent.spawn(eat,"ivy")g2 = gevent.spawn(play,"zoe")g1.join()g2.join()print("主")通过gevent实现爬虫
from gevent import monkey; monkey.patch_all()import requestsimport timeimport geventdef get_page(url): print("get page:%s" %url) response = requests.get(url) if response.status_code == 200: print(response.text)start_time = time.time()g1 = gevent.spawn(get_page,url = "https://www.python.org")g2 = gevent.spawn(get_page,url="https://yahoo.com")g3 = gevent.spawn(get_page,url = "https://github.com")gevent.joinall([g1,g2,g3])stop_time = time.time()print("时长",stop_time-start_time)
十六. GIL全局解释器锁
只有cpython才有,cpython的线程管理不安全,在python中同一个进程下开的线程只能有一个cpu执行GIL保护的是解释器的数据针对不同的数据使用不同的锁去保护python解释器调用的是操作系统的原生线程,谁先拿到GIL锁谁先执行,保护共享数据知识补充:1.定时执行任务
知识点补充
定时去运行一个任务from threading import Timerdef hello(name): print("%s say hello" %name)t = Timer(3,hello,args=("Ivy",))t.start()
来源:https://www.cnblogs.com/guniang/p/10969274.html