Tutorials on optimizing non-trivial Python applications with C extensions or Cython

半世苍凉 提交于 2019-12-02 17:33:29

Points 1 and 2 are just basic optimization rule of thumbs. I would be very astonished if there was anywhere the kind of tutorial you are looking for. Maybe that's why you haven't found one. My short list:

  • rule number one of optimization is don't.
  • rule number two measure
  • rule number three identify the limiting factor (if it's IO or database bound, no optimization may be reachable anyway).
  • rule number four is think, use better algorithms and data structure ...
  • considering a change of language is quite low on the list...

Just start by profiling your python code with usual python tools. Find where you code need to be optimized. Then try to optimize it sticking with python. If it is still too slow, try to understand why. If it's IO bound it is unlikely a C program would be better. If the problem come from the algorithm it is also unlikely C would perform better. Really the "good" cases where C could help are quite rare, runtime should not be too far away from what you want (like a 2 of 3 times speedup) data structure are simples and would benefit from a low level representation and you really, really need that speedup. In most other cases using C instead of python will be an unrewarding job.

Really it is quite rare calling C code from python is done with performance in mind as a primary goal. More often the goal is to interface python with some existing C code.

And as another other poster said, you would probably be better advised of using cython.

If you still want to write a C module for Python, all necessary is in the official documentation.

O'Reilly has a tutorial (freely available as far as I can tell, I was able to read the whole thing) that illustrates how to profile a real project (they use an EDI parsing project as a subject for profiling) and identify hotspots. There's not too much detail on writing the C extension that will fix the bottleneck in the O'Reilly article. It does, however, cover the first two things that you want with a non-trivial example.

The process of writing C extensions is fairly well documented here. The hard part is coming up with ways to replicate what Python code is doing in C, and that takes something that would be hard to teach in a tutorial: ingenuity, knowledge of algorithms, hardware, and efficiency, and considerable C skill.

Hope this helps.

For points 1 and 2, I would use a Python profiler, for example cProfile. See here for a quick tutorial.

If you've got an already existing python program, for point 3 you might want to consider using Cython. Of course, rather than re-writing in C, you may be able to think up an algorithmic improvement that will increase execution speed.

Mike Dunlavey

I will try to address your points 1 and 2, and your first 3 bullet points, but not in order.

The third bullet point says "assume the algorithm and python code is already optimal". When code is in that state, if one takes stack samples (as outlined here), the samples show exactly what the program is doing, from a time perspective, and there seems to be nothing that could be improved without language change. However, since you know how it is spending its time, you know which low-level algorithm (which could consist of more than one function, not just a hotspot) could benefit by being made to take less time, i.e. by being converted to C.

Regarding point 1, this method shows which parts of the code will benefit by conversion to C, and they may or may not be hotspots. (The first thing that comes to mind is any sort of recursive function or set of functions. Or, a small group of functions that together accomplish some purpose, such as a hill-climber.)

Regarding point 2, any code which does not appear on a healthy percent of stack samples, or which does but clearly will not benefit by being converted to C, such as I/O.

Regarding the first and second bullet points, I would agree that measuring is not the primary objective, but a by-product of the process of finding the code to optimize. Presenting such measurements also is beside the point.

I have been in similar situations, except not between python and C, but between C and hardware.**

Just to give an example, if the total run time is 10 seconds, and the algorithm is on the stack roughly 50% of the time, then it is responsible for roughly 5 of the 10 seconds. If converting the algorithm to C would give a 10x speedup, then that 5 seconds would shrink to 0.5 seconds, so the overall time would shrink to 5.5 seconds. (Roughly - it's more important to achieve the time reduction than to know in advance precisely how big it will be.) Notice, at this point, the whole process could be repeated, and it might make sense to convert something else to C also. You can stop this process when samples show that the python code is doing what it's good at, and the C code is doing what it's good at.

** e.g. Floating-point math, library vs. chip, or graphics, drawing text & polygons.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!