Binary Extensions

One of the features of the CPython interpreter is that in addition to executing Python code, it also has a rich C API available for use by other software. One of the most common uses of this C API is to create importable C extensions that allow things that are difficult to achieve in pure Python code.

Use Cases

The typical use cases for binary extensions can be divided into three categories:

Accelerator modules

These modules are stand-alone and are only created to run faster than the corresponding pure Python code. Ideally, the accelerator modules always have a Python equivalent that can be used as a fallback if the accelerated version is not available on a particular system.

The CPython standard library uses many accelerator modules.

Wrapper modules

These modules are created to make existing C interfaces available in Python. You can either make the underlying C interfaces directly available or provide a Pythonic API that uses features of Python to make the API easier to use.

The CPython standard library uses extensive wrapper modules.

Low-level system access

These modules are created to access functions of the CPython runtime environment, the operating system or the underlying hardware. With platform-specific code, things can be achieved that would not be possible with pure Python code.

A number of CPython standard library modules are written in C to access interpreter internals that are not available at the language level.

A particularly noteworthy property of C extensions is that they can release the Global Interpreter Lock (GIL) of CPython for long-running operations, regardless of whether these operations are CPU or IO-bound.

Not all expansion modules fit exactly into the above categories. For example, the extension modules contained in NumPy cover all three use cases:

  • They move inner loops to C for speed reasons,

  • wrap external libraries in C, FORTRAN and other languages and

  • use low-level system interfaces of CPython and the underlying operating system to support the concurrent execution of vectorised operations and to precisely control the memory layout of objects created.

Disadvantages

In the past, the main disadvantage of using binary extensions was that they made it difficult to distribute the software. Today this disadvantage due to wheel is hardly present. However, some disadvantages remain:

  • The installation from the sources remains complicated.

  • Possibly there is no suitable wheel for the build of the CPython interpreter or alternative interpreters such as PyPy, IronPython or Jython.

  • The maintenance of the packages is more time-consuming because the maintainers not only have to be familiar with Python but also with another language and the CPython C API. In addition, the complexity increases if a Python fallback implementation is provided in addition to the binary extension.

  • Finally, import mechanisms, such as direct import from ZIP files, often do not work for extension modules.

Alternatives

… to accelerator modules

If extensions modules are only used to make code run faster, a number of other alternatives should also be considered:

  • Looks for existing optimised alternatives. The CPython standard library contains a number of optimised data structures and algorithms, especially in the builtins and the modules collections and itertools.

    Occasionally the Python Package Index (PyPI) also offers additional alternatives. Sometimes a third-party module can avoid the need to create your own accelerator module.

  • For long-running applications, the JIT-compiled PyPy interpreter can be a suitable alternative to the standard CPython. The main difficulty with adopting PyPy is typically the dependence on other Binary Extensions modules. While PyPy emulates the CPython C API, modules that rely on it cause problems for the PyPy JIT, and the emulation often exposes defects in extension modules that CPython tolerates. (often with reference counting errors).

  • Cython is a sophisticated static compiler that can compile most Python code into C-Extension modules. The initial compilation offers some speed increases (by bypassing the CPython interpreter level), and Cython’s optional static typing functions can provide additional speed increases. For Python programmers, Cython offers a lower barrier to entry relative to other languages such as C or C ++).

    However, using Cython has the disadvantage of adding complexity to the distribution of the resulting application.

  • Numba is a newer tool that uses the LLVM compiler infrastructure to selectively compile parts of a Python application to native machine code at runtime. It requires LLVM to be available on the system the code is running on. It can lead to considerable increases in speed, especially with vectorisable processes.

… to wrapper modules

The C-ABI (Application Binary Interface) is a standard for the common use of functions between several applications. One of the strengths of the CPython C-API (Application Programming Interface) is that Python users can take advantage of this functionality. However, manually wrapping modules is very tedious, so a number of other alternatives should be considered.

The approaches described below do not simplify distribution, but they can significantly reduce the maintenance effort compared to wrapper modules.

  • Cython is useful not only for creating accelerator modules, but also for creating wrapper modules. Since the API still needs to be wrapped by hand, it is not a good choice when wrapping large APIs.

  • cffi is the project of some PyPy developers to give developers who already know both Python and C the possibility to make their C modules available for Python applications. It makes wrapping a C module based on its header files relatively easy, even if you are not familiar with C itself.

    One of the main advantages of cffi is that it is compatible with the PyPy JIT so that CFFI wrapper modules can fully participate in the PyPy tracing JIT optimisations.

  • SWIG is a wrapper interface generator that combines a variety of programming languages, including Python, with C and C ++ code.

  • The ctypes module of the standard library is useful to get access to C interfaces, but if the header information is not available, it suffers from the fact that it only works on the C ABI level and therefore no automatic consistency check between the exported Interface and the Python code. In contrast, the alternatives above can all work on the C API and use C header files to ensure consistency.

  • pythoncapi_compat can be used to write a C extension that supports multiple Python versions with a single code base. It consists of the header file pythoncapi_compat.h and the script upgrade_pythoncapi.py.

… for low-level system access

For applications that require low level system access, a binary extension is often the best option. This applies in particular to the low level access to the CPython runtime, since some operations (such as releasing the Global Interpreter Lock (GIL) are not permitted when the interpreter executes the code itself, especially when modules such as ctypes or cffi are used to Get access to the relevant C-API interfaces.

In cases where the expansion module is manipulating the underlying operating system or hardware (instead of the CPython runtime), it is sometimes better to write a normal C library (or a library in another programming language such as C++ or Rust) that provides a C-compatible ABI) and then use one of the wrapping techniques described above to make the interface available as an importable Python module.

Implementation

We now want to extend our dataprep package and integrate some C code. For this we use Cython to translate the Python code from dataprep/src/dataprep/cymean.pyx into optimised C code during the build process. Cython files have the suffix pyx and can contain both Python and C code.

However, we cannot currently use hatchling.build as a build backend, but instead fall back on a current version of setuptools:

19dependencies = [
20    "Cython",
21    "pandas",
22]

The setuptools use dataprep/setup.py to include non-Python files in a package.


setup(
    ext_modules=cythonize("src/dataprep/cymean.pyx"),

Note

With extensionlib there is a toolkit for extension modules, which does not yet contain a hatchling plugin.

Note

Alternatively, you could use Meson or scikit-build:

[build-system]
requires = ["meson-python"]
build-backend = "mesonpy"
[build-system]
requires = ["scikit-build-core"]
build-backend = "scikit_build_core.build"

Since Cython itself is a Python package, it can simply be added to the list of dependencies in the dataprep/pyproject.toml file:

2requires = ["Cython", "setuptools>=61.0"]

Now you can run the build process with the pyproject-build command and check whether the Cython file ends up in the package as expected:

$ pyproject-build .
* Creating venv isolated environment...
* Installing packages in isolated environment... (cython, setuptools>=40.6.0, wheel)
* Getting dependencies for sdist...
Compiling src/dataprep/cymean.pyx because it changed.
[1/1] Cythonizing src/dataprep/cymean.pyx

copying src/dataprep/cymean.c -> dataprep-0.1.0/src/dataprep
copying src/dataprep/cymean.pyx -> dataprep-0.1.0/src/dataprep

running build_ext
building 'dataprep.cymean' extension

Successfully built dataprep-0.1.0.tar.gz and dataprep-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl

Finally, we can check our package with check-wheel-contents:

$ check-wheel-contents dataprep/dist/*.whl
dataprep/dist/dataprep-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl: OK

Alternatively, you can install our dataprep package and use mean:

$ python -m pip install dataprep/dist/dataprep-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl
$ python
>>> from dataprep.mean import mean
>>> from random import randint
>>> nums = [randint(1, 1_000) for _ in range(1_000_000)]
>>> mean(nums)
500097.867198

With the random.randint function a tlist of one million random numbers with values between 1 and 1000 was created.

See also

The CPython Extending and Embedding guide contains an introduction to writing your own extension modules in C: Extending Python with C or C++. However, please note that this introduction only discusses the basic tools for creating extensions that are provided as part of CPython. Third-party tools such as Cython, cffi, SWIG, and Numba offer both simpler and more sophisticated approaches to building C and C++ extensions for Python.

Python Packaging User Guide: Binary Extensions not only covers various available tools that simplify the creation of binary extensions, but also explains the various reasons why creating an extension module might be desirable.

Creating binary extensions

Binary extensions for Windows

Before you can create a binary extension, you have to make sure that you have a suitable compiler available. On Windows, Visual C is used to create the official CPython interpreter, and it should also be used to create compatible binary extensions:

For Python ≥ 3.5 install Visual Studio Code with Python Extension

Note

Visual Studio is backwards compatible from Python 3.5, which means that any future version of Visual Studio can create Python extensions for all Python versions from version 3.5.

Building with the recommended compiler on Windows ensures that a compatible C library is used throughout the Python process.

Binary Extensions for Linux

Linux binaries must use a sufficiently old glibc to be compatible with older distributions. Distrowatch prepares in table form which versions of the distributions deliver which library:

The PYPA/Manylinux project facilitates the distribution of Binary extensions as Wheels for most Linux platforms. This also resulted in PEP 513, which defines the manylinux1_x86_64 and manylinux1_i686 platform tags.

Binary Extensions for Mac

Binary compatibility on macOS is determined by the target system for the minimal implementation, e.g. 10.9, which is defined in the environment variable MACOSX_DEPLOYMENT_TARGET. When creating with setuptools/distutils the deployment target is specified with the flag --plat-name, for example macosx-10.9-x86_64. For more information on deployment targets for Mac OS Python distributions, see the MacPython Spinning Wheels-Wiki.

Deployment of binary extensions

In the following, the deployment on the Python Package Index (PyPI) or another index will be described.

Note

When deploying on Linux distributions, it should be noted that these make demands on the specific build system. Therefore, Source Distributions (sdist) should also be provided in addition to Wheels.