dask

24招加速你的Python,超级实用!

假如想象 提交于 2021-01-14 04:46:30
云哥前期从以下九个方面讨论了加速Python的具体方法,一共24个,每个都带有优化前后的对比,非常实用。 分析代码运行时间 加速查找 加速循环 加速函数 实用标准库加速 Numpy向量化加速 加速Pandas Dask加速 多线程多进程加速 我在此基础上 主要美化了编辑 ,方便读者更容易阅读学习。 “ 一 、分析代码运行时间 ” 1 测算代码单次运行时间 平凡法: 快捷法(Jupyter): 2 测算代码重复执行多次平均用时 平凡法: 快捷法(Jupyter): 3 按调用函数分析代码运行时间 平凡法: 快捷法(Jupyter): 4 按行分析代码运行时间 平凡法: 快捷法(Jupyter): “ 二、加速你的查找 ” 5 用set而非list进行in查找 低速法: 高速法: 6 用dict而非两个list进行匹配查找 低速法: 高速法: “ 三、加速你的循环 ” 7 优先使用for循环而不是while循环 低速法: 高速法: 8 循环体中避免重复运算 低速法: 高速法: “ 四、加速你的函数 ” 9、用缓存机制加速递归函数 低速法: 高速法: 10、用循环取代递归 低速法: 高速法: 11、 使用Numba加速Python函数 低速法: 高速法: “ 五、使用标准库函数进行加速 ” 12、使用collections.Counter类加速计数 低速法: 高速法: 13

Flask 作者 Armin Ronacher:我不觉得有异步压力

╄→гoц情女王★ 提交于 2021-01-12 02:59:05
https://zhuanlan.zhihu.com/p/102307133 英文 | I'm not feeling the async pressure 【1】 原作 | Armin Ronacher,2020.01.01 译者 | 豌豆花下猫@Python猫 声明 :本翻译基于 CC BY-NC-SA 4.0 【2】授权协议,内容略有改动,转载请保留原文出处,请勿用于商业或非法用途。 异步(async)正风靡一时。异步Python、异步Rust、go、node、.NET,任选一个你最爱的语言生态,它都在使用着一些异步。异步这东西有多好,这在很大程度上取决于语言的生态及其运行时间,但总体而言,它有一些不错的好处。它使得这种事情变得非常简单:等待可能需要一些时间才能完成的操作。 它是如此简单,以至于创造了无数新的方法来坑人(blow ones foot off)。我想讨论的一种情况是,直到系统出现超载,你才意识到自己踩到了脚的那一种,这就是背压(back pressure)管理的主题。在协议设计中有一个相关术语是流量控制(flow control)。 什么是背压 关于背压的解释有很多,我推荐阅读的一个很好的解释是: Backpressure explained — the resisted flow of data through software 【3】。因此

How can I efficiently transpose a 67 gb file/Dask dataframe without loading it entirely into memory?

那年仲夏 提交于 2020-12-31 08:44:54
问题 I have 3 rather large files (67gb, 36gb, 30gb) that I need to train models on. However, the features are rows and the samples are columns. Since Dask hasn't implemented transpose and stores DataFrames split by row, I need to write something to do this myself. Is there a way I can efficiently transpose without loading into memory? I've got 16 gb of ram at my disposal and am using jupyter notebook. I have written some rather slow code, but would really appreciate a faster solution. The speed of

How can I efficiently transpose a 67 gb file/Dask dataframe without loading it entirely into memory?

不羁岁月 提交于 2020-12-31 08:42:51
问题 I have 3 rather large files (67gb, 36gb, 30gb) that I need to train models on. However, the features are rows and the samples are columns. Since Dask hasn't implemented transpose and stores DataFrames split by row, I need to write something to do this myself. Is there a way I can efficiently transpose without loading into memory? I've got 16 gb of ram at my disposal and am using jupyter notebook. I have written some rather slow code, but would really appreciate a faster solution. The speed of

How can I efficiently transpose a 67 gb file/Dask dataframe without loading it entirely into memory?

偶尔善良 提交于 2020-12-31 08:42:49
问题 I have 3 rather large files (67gb, 36gb, 30gb) that I need to train models on. However, the features are rows and the samples are columns. Since Dask hasn't implemented transpose and stores DataFrames split by row, I need to write something to do this myself. Is there a way I can efficiently transpose without loading into memory? I've got 16 gb of ram at my disposal and am using jupyter notebook. I have written some rather slow code, but would really appreciate a faster solution. The speed of

How can I efficiently transpose a 67 gb file/Dask dataframe without loading it entirely into memory?

浪尽此生 提交于 2020-12-31 08:40:14
问题 I have 3 rather large files (67gb, 36gb, 30gb) that I need to train models on. However, the features are rows and the samples are columns. Since Dask hasn't implemented transpose and stores DataFrames split by row, I need to write something to do this myself. Is there a way I can efficiently transpose without loading into memory? I've got 16 gb of ram at my disposal and am using jupyter notebook. I have written some rather slow code, but would really appreciate a faster solution. The speed of

Using Dask's NEW to_sql for improved efficiency (memory/speed) or alternative to get data from dask dataframe into SQL Server Table

喜夏-厌秋 提交于 2020-12-29 06:52:31
问题 My ultimate goal is to use SQL/Python together for a project with too much data for pandas to handle (at least on my machine). So, I have gone with dask to: read in data from multiple sources (mostly SQL Server Tables/Views) manipulate/merge the data into one large dask dataframe table of ~10 million+ rows and 52 columns, some of which have some long unique strings write it back to SQL Server on a daily basis, so that my PowerBI report can automatically refresh the data. For #1 and #2, they

dask handle delayed failures

断了今生、忘了曾经 提交于 2020-12-16 02:25:11
问题 How can I port the following function to dask in order to parallelize it? from time import sleep from dask.distributed import Client from dask import delayed client = Client(n_workers=4) from tqdm import tqdm tqdm.pandas() # linear things = [1,2,3] _x = [] _y = [] def my_slow_function(foo): sleep(2) x = foo y = 2 * foo assert y < 5 return x, y for foo in tqdm(things): try: x_v, y_v = my_slow_function(foo) _x.append(x_v) if y_v is not None: _y.append(y_v) except AssertionError: print(f'failed:

dask handle delayed failures

好久不见. 提交于 2020-12-16 02:22:33
问题 How can I port the following function to dask in order to parallelize it? from time import sleep from dask.distributed import Client from dask import delayed client = Client(n_workers=4) from tqdm import tqdm tqdm.pandas() # linear things = [1,2,3] _x = [] _y = [] def my_slow_function(foo): sleep(2) x = foo y = 2 * foo assert y < 5 return x, y for foo in tqdm(things): try: x_v, y_v = my_slow_function(foo) _x.append(x_v) if y_v is not None: _y.append(y_v) except AssertionError: print(f'failed:

dask handle delayed failures

北城以北 提交于 2020-12-16 02:22:17
问题 How can I port the following function to dask in order to parallelize it? from time import sleep from dask.distributed import Client from dask import delayed client = Client(n_workers=4) from tqdm import tqdm tqdm.pandas() # linear things = [1,2,3] _x = [] _y = [] def my_slow_function(foo): sleep(2) x = foo y = 2 * foo assert y < 5 return x, y for foo in tqdm(things): try: x_v, y_v = my_slow_function(foo) _x.append(x_v) if y_v is not None: _y.append(y_v) except AssertionError: print(f'failed: