Collapse multiple submodules to one Cython extension

让人想犯罪 __ 提交于 2019-11-26 19:04:57

First off, I should note that it's impossible to compile a single .so file with sub packages using Cython. So if you want sub packages, you're going to have to generate multiple .so files, as each .so can only represent a single module.

Second, it doesn't appear that you can compile multiple Cython/Python files (I'm using the Cython language specifically) and link them into a single module at all.

I've tried to compile multiply Cython files into a single .so every which way, both with distutils and with manual compilation, and it always fails to import at runtime.

It seems that it's fine to link a compiled Cython file with other libraries, or even other C files, but something goes wrong when linking together two compiled Cython files, and the result isn't a proper Python extension.

The only solution I can see is to compile everything as a single Cython file. In my case, I've edited my setup.py to generate a single .pyx file which in turn includes every .pyx file in my source directory:

includesContents = ""
for f in os.listdir("src-dir"):
    if f.endswith(".pyx"):
        includesContents += "include \"" + f + "\"\n"

includesFile = open("src/extension-name.pyx", "w")
includesFile.write(includesContents)
includesFile.close()

Then I just compile extension-name.pyx. Of course this breaks incremental and parallel compilation, and you could end up with extra naming conflicts since everything gets pasted into the same file. On the bright side, you don't have to write any .pyd files.

I certainly wouldn't call this a preferable build method, but if everything absolutely has to be in one extension module, this is the only way I can see to do it.

This answer provides a prototype for Python3 (which can be easily adapted for Python2) and shows how several cython-modules can be bundled into single extension/shared-library/pyd-file.

I keep it around for historical/didactical reasons - a more concise recipe is given in this answer, which present a good alternative to @Mylin's proposal of putting everything into the same pyx-file.


Preliminary note: Since Cython 0.29, Cython uses multi-phase initialization for Python>=3.5. One needs to switch multi-phase initialization off (otherwise PyInit_xxx isn't sufficient, see this SO-post), which can be done by passing -DCYTHON_PEP489_MULTI_PHASE_INIT=0 to gcc/other compiler.


When bundling multiple Cython-extension (let's call them bar_a and bar_b) into one single shared object (let's call it foo), the main problem is the import bar_a operation, because of the way the loading of modules works in Python (obviously simplified):

  1. Look for bar_a.py and load it, if not successful...
  2. Look for bar_a.so (or similar), use ldopen for loading the shared library and call PyInit_bar_a which would initialize/register the module.

Now, the issue is that there is no bar_a.so to be found and albeit the initialization function PyInit_bar_a can be found in foo.so, Python doesn't know where to look and gives up on searching.

Luckily, there are hooks available, so we can teach Python to look in the right places.

When importing a module, Python utilizes finders from sys.meta_path, which return the right loader for a module (for simplicity I'm using the legacy workflow with loaders and not module-spec). The default finders return None, i.e. no loader and it results in the import error.

That means we need to add a custom finder to sys.meta_path, which would recognize our bundled modules and return loaders, which in their turn would call the right PyInit_xxx-function.

The missing part: How should the custom finder finds its way into the sys.meta_path? It would be pretty inconvenient if the user would have to do it manually.

When a submodule of a package is imported, first the package's __init__.py-module is loaded and this is the place where we can inject our custom finder.

After calling python setup.py build_ext install for the setup presented further below, there is a single shared library installed and the submodules can be loaded as usual:

>>> import foo.bar_a as a
>>> a.print_me()
I'm bar_a
>>> from foo.bar_b import print_me as b_print
>>> b_print()
I'm bar_b

Putting it all together:

Folder structure:

../
 |-- setup.py
 |-- foo/
      |-- __init__.py
      |-- bar_a.pyx
      |-- bar_b.pyx
      |-- bootstrap.pyx

__init__.py:

# bootstrap is the only module which 
# can be loaded with default Python-machinery
# because the resulting extension is called `bootstrap`:
from . import bootstrap

# injecting our finders into sys.meta_path
# after that all other submodules can be loaded
bootstrap.bootstrap_cython_submodules()

bootstrap.pyx:

import sys
import importlib

# custom loader is just a wrapper around the right init-function
class CythonPackageLoader(importlib.abc.Loader):
    def __init__(self, init_function):
        super(CythonPackageLoader, self).__init__()
        self.init_module = init_function

    def load_module(self, fullname):
        if fullname not in sys.modules:
            sys.modules[fullname] = self.init_module()
        return sys.modules[fullname]

# custom finder just maps the module name to init-function      
class CythonPackageMetaPathFinder(importlib.abc.MetaPathFinder):
    def __init__(self, init_dict):
        super(CythonPackageMetaPathFinder, self).__init__()
        self.init_dict=init_dict

    def find_module(self, fullname, path):
        try:
            return CythonPackageLoader(self.init_dict[fullname])
        except KeyError:
            return None

# making init-function from other modules accessible:
cdef extern from *:
    """
    PyObject *PyInit_bar_a(void);
    PyObject *PyInit_bar_b(void);
    """
    object PyInit_bar_a()
    object PyInit_bar_b()

# wrapping C-functions as Python-callables:
def init_module_bar_a():
    return PyInit_bar_a()

def init_module_bar_b():
    return PyInit_bar_b()


# injecting custom finder/loaders into sys.meta_path:
def bootstrap_cython_submodules():
    init_dict={"foo.bar_a" : init_module_bar_a,
               "foo.bar_b" : init_module_bar_b}
    sys.meta_path.append(CythonPackageMetaPathFinder(init_dict))  

bar_a.pyx:

def print_me():
    print("I'm bar_a")

bar_b.pyx:

def print_me():
    print("I'm bar_b")

setup.py:

from setuptools import setup, find_packages, Extension
from Cython.Build import cythonize

sourcefiles = ['foo/bootstrap.pyx', 'foo/bar_a.pyx', 'foo/bar_b.pyx']

extensions = cythonize(Extension(
            name="foo.bootstrap",
            sources = sourcefiles,
    ))


kwargs = {
      'name':'foo',
      'packages':find_packages(),
      'ext_modules':  extensions,
}


setup(**kwargs)

NB: This answer was the starting point for my experiments, however it uses PyImport_AppendInittab and I cannot see a way how can this be plugged in into the normal python.

DavidW

This answer is follows the basic pattern of @ead's answer, but uses a slightly simpler approach, which eliminates most of boilerplate code.

The only difference is the simpler version of bootstrap.pyx:

import sys
import importlib

# Chooses the right init function     
class CythonPackageMetaPathFinder(importlib.abc.MetaPathFinder):
    def __init__(self, name_filter):
        super(CythonPackageMetaPathFinder, self).__init__()
        self.name_filter =  name_filter

    def find_module(self, fullname, path):
        if fullname.startswith(self.name_filter):
            # use this extension-file but PyInit-function of another module:
            return importlib.machinery.ExtensionFileLoader(fullname,__file__)


# injecting custom finder/loaders into sys.meta_path:
def bootstrap_cython_submodules():
    sys.meta_path.append(CythonPackageMetaPathFinder('foo.')) 

Essentially, I look to see if the name of the module being imported starts with foo., and if it does I reuse the standard importlib approach to loading an extension module, passing the current .so filename as the path to look in - the right name of the init function (there are multiple ) will be deduced from the package name.

Obviously, this is just a prototype - one might want to do some improvements. For example, right now import foo.bar_c would lead to a somewhat unusual error message: "ImportError: dynamic module does not define module export function (PyInit_bar_c)", one could return None for all submodule names that are not on a white list.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!