Creating a distribution package¶
Distribution Packages are archives that can be uploaded to a package index such as pypi.org and installed with pip.
Some of the following commands require a new version of pip, so you should make sure you have the latest version installed:
$ python3 -m pip install --upgrade pip
> python -m pip install --upgrade pip
Structure¶
A minimal distribution package can look like this, for example:
dataprep
├── pyproject.toml
└── src
└── dataprep
├── __init__.py
└── loaders.py
pyproject.toml
¶
PEP 517 and PEP 518 brought extensible build backends, isolated builds and pyproject.toml in TOML format.
Among other things, pyproject.toml
tells pip and build
which backend tool to use to build distribution packages for your project. You
can choose from a number of backends, though this tutorial uses hatchling
by
default.
A minimal yet functional dataprep/pyproject.toml
file will then look
like this, for example:
1[build-system]
2requires = ["hatchling"]
3build-backend = "hatchling.build"
build-system
defines a section describing the build system
requires
defines a list of dependencies that must be installed for the build system to work, in our case
hatchling
.Note
Dependency version numbers should usually be written in the requirements.txt file, not here.
build-backend
identifies the entry point for the build-backend object as a dotted path. The
hatchling
backend object is available underhatchling.build
.Note
However, for Python packages that contain binary extensions with
Cython
,C
,C++
,Fortran
orRust
, the hatchling backend is not suitable. One of the following backends should be used here:But thatr’s not all – there are other backends:
See also
Note
With validate-pyproject you can check your
pyproject.toml
file.
See also
If you want to look at alternatives to hatchling
:
Metadata¶
In pyproject.toml
you can also specify metadata for your package, such
as:
5[project]
6name = "dataprep"
7version = "0.1.0"
8authors = [
9 { name="Veit Schiele", email="veit@cusy.io" },
10]
11description = "A small dataprep package"
12readme = "README.rst"
13requires-python = ">=3.7"
14classifiers = [
15 "Programming Language :: Python :: 3",
16 "License :: OSI Approved :: BSD License",
17 "Operating System :: OS Independent",
18]
19dependencies = [
20 "pandas",
21]
22
23[project.urls]
24"Homepage" = "https://github.com/veit/dataprep"
25"Bug Tracker" = "https://github.com/veit/dataprep/issues"
name
is the distribution name of your package. This can be any name as long as it contains only letters, numbers,
.
,_
and-
. It should also not already be assigned on the Python Package Index (PyPI).version
is the version of the package.
In our example, the version number has been set statically. However, there is also the possibility to specify the version dynamically, for example by a file:
[project] ... dynamic = ["version"] [tool.hatch.version] path = "src/dataprep/__about__.py"
The default pattern looks for a variable called
__version__
orVERSION
, which contains the version, optionally preceded by the lower case letterv
. The default pattern is based on PEP 440.If this is not the way you want to store versions, you can define a different regular expression with the
pattern
option.See also
However, there are other version scheme plug-ins, such as hatch-semver for semantic Versioning.
With the version source plugin hatch-vcs you can also use Git tags:
[build-system] requires = ["hatchling", "hatch-vcs"] ... [tool.hatch.version] source = "vcs" raw-options = { local_scheme = "no-local-version" }
The setuptools backend also allows dynamic versioning:
[build-system] requires = ["setuptools>=61.0", "setuptools-scm"] build-backend = "setuptools.build_meta" [project] ... dynamic = ["version"] [tool.setuptools.dynamic] version = {attr = "dataprep.VERSION"}
authors
is used to identify the authors of the package by name and email address.
You can also list
maintainers
in the same format.description
is a short summary of the package, consisting of one sentence.
readme
is a path to a file containing a detailed description of the package. This is displayed on the package details page on Python Package Index (PyPI). In this case, the description is loaded from
README.rst
.requires-python
specifies the versions of Python that are supported by your project. This will cause installers like pip to search through older versions of packages until they find one that has a matching Python version.
classifiers
gives the Python Package Index (PyPI) and pip some additional metadata about your package. In this case, the package is only compatible with Python 3, is under the BSD licence and is OS independent. You should always at least specify the versions of Python your package runs under, under which licence your package is available and on which operating systems your package runs. You can find a complete list of classifiers at https://pypi.org/classifiers/.
They also have a useful additional feature: to prevent a package from being uploaded to PyPI, use the special classifier
"Private :: Do Not Upload"
. PyPI will always reject packages whose classifier starts with"Private ::"
.dependencies
gibt die Abhängigkeiten für euer Paket in einem Array an.
See also
urls
lets you list any number of additional links that are displayed on the Python Package Index (PyPI). In general, this could lead to source code, documentation, task managers, etc.
See also
Optional dependencies¶
project.optional-dependencies
allows you to specify optional dependencies for your package. You can also distinguish between different sets:
34[project.optional-dependencies]
35tests = [
36 "coverage[toml]",
37 "pytest>=6.0",
38]
39docs = [
40 "furo",
41 "sphinxext-opengraph",
42 "sphinx-copybutton",
43 "sphinx_inline_tabs"
44]
Recursive optional dependencies are also possible with pip ≥ 21.2. For example,
for dev
you can take over all dependencies from docs
and test
in
addition to pre-commit
:
35dev = [
36 "dataprep[tests, docs]",
37 "pre-commit"
38]
You can install these optional dependencies, for example with:
$ cd /PATH/TO/YOUR/DISTRIBUTION_PACKAGE
$ python3 -m venv .
$ . bin/activate
$ python -m pip install --upgrade pip
$ python -m pip install -e '.[dev]'
> cd C:\PATH\TO\YOUR\DISTRIBUTION_PACKAGE
> python3 -m venv .
> Scripts\activate.bat
> python -m pip install --upgrade pip
> python -m pip install -e '.[dev]'
src
package¶
When you create a new package, you shouldn’t use a flat layout but the
src
layout, which is also recommended in Packaging Python Projects of the
PyPA. A major advantage of this layout is that tests are run with the
installed version of your package and not with the files in your working
directory.
See also
Hynek Schlawack: Testing & Packaging
Note
In Python ≥ 3.11 PYTHONSAFEPATH
can be used to ensure that the
installed packages are used first.
dataprep
is the directory that contains the Python files. The name should match the project name to simplify configuration and be more recognisable to those installing the package.
__init__.py
is required to import the directory as a package. The file should be empty.
loaders.py
is an example of a module within the package that could contain the logic (functions, classes, constants, etc.) of your package.
Other files¶
CONTRIBUTORS.rst
¶
See also
LICENSE
¶
You can find detailed information on this in the Licensing section.
README.rst
¶
This file briefly tells those who are interested in the package how to use it.
See also
If you write the document in reStructuredText, you can also include the contents as a detailed description in your package:
setup(
ext_modules=cythonize("src/dataprep/cymean.pyx"),
You can also include them in your Sphinx documentation
with .. include:: ../../README.rst
.
CHANGELOG.rst
¶
Historical files or files needed for binary extensions¶
Before the pyproject.toml
file introduced with PEP 518 became the
standard, setuptools
required setup.py
, setup.cfg
and
MANIFEST.in
. Today, however, these files are only needed for
binary extensions at best.
If you want to replace these files in your packages, you can do so with hatch
new --init
or ini2toml.
setup.py
¶
A minimal and yet functional dataprep/setup.py
can look like this,
for example:
1setup(
2 ext_modules=cythonize("src/dataprep/cymean.pyx"),
package_dir
points to the src
directory, which can contain one or more packages. You can
then use setuptools’s find_packages()
to find all packages in this directory.
Note
find_packages()
without src/
directory would package all directories
with a __init__.py
file, so also tests/
directories.
setup.cfg
¶
This file is no longer needed, at least not for packaging. wheel
nowadays
collects all required licence files automatically and setuptools
can build
universal wheel
packages with the options
keyword argument, for example
dataprep-0.1.0-py3-none-any.whl
.
MANIFEST.in
¶
The file contains all files and directories that are not already covered by
packages
or py_module
. It can look like this:
dataprep/MANIFEST.in
:
1include LICENSE *.rst *.toml *.yml *.yaml *.ini
2graft src
3recursive-exclude __pycache__ *.py[cod]
For more instructions in Manifest.in
, see MANIFEST.in commands.
Note
People often forget to update the Manifest.in
file. To avoid this,
you can use check-manifest in a
pre-commit hook.
Note
If you want files and directories from MANIFEST.in
to be installed as
well, for example if they are runtime-relevant data, you can specify this
with include_package_data=True
in your setup()
call.
Build¶
The next step is to create distribution packages for the package. These are archives that can be uploaded to the Python Package Index (PyPI) and installed by pip.
Make sure you have the latest version of build
installed:
Now run the command in the same directory where pyproject.toml
is
located:
$ python -m pip install build
$ cd /PATH/TO/YOUR/DISTRIBUTION_PACKAGE
$ rm -rf build dist
$ python -m build
> python -m pip install build
> cd C:\PATH\TO\YOUR\DISTRIBUTION_PACKAGE
> rm -rf build dist
> python -m build
The second line ensures that a clean build is created without artefacts from
previous builds. The third line should output a lot of text and create two files
in the dist
directory when finished:
dist
├── dataprep-0.1.0-py3-none-any.whl
└── dataprep-0.1.0.tar.gz
dataprep-0.1.0-py3-none-any.whl
is a binary distribution format with the suffix
..whl
, where the filename is composed as follows:dataprep
is the normalised package name
0.1.0
is the version of the distribution package
py3
specifies the Python version and, if applicable, the C-ABI
none
specifies whether the Wheel package is suitable for any OS or only specific ones
any
any
is suitable for any processor architecture,x86_64
on the other hand only for chips with the x86 instruction set and a 64-bit architecture
dataprep-0.1.0.tar.gz
is a source distribution.
See also
The reference for the file names can be found in File name convention.
For more information on sdist
, see Creating a Source Distribution
and PEP 376.
Testing¶
$ mkdir test_env
$ cd test_env
$ python3 -m venv .
$ source bin/activate
$ python -m pip install dist/dataprep-0.1.0-py3-none-any.whl
Processing ./dist/dataprep-0.1.0-py3-none-any.whl
Collecting pandas
Using cached pandas-1.3.4-cp39-cp39-macosx_10_9_x86_64.whl (11.6 MB)
…
Successfully installed dataprep-0.1.0 numpy-1.21.4 pandas-1.3.4 python-dateutil-2.8.2 pytz-2021.3 six-1.16.0
> mkdir test_env
> cd test_env
> python -m venv .
> Scripts\activate.bat
> python -m pip install dist/dataprep-0.1.0-py3-none-any.whl
Processing ./dist/dataprep-0.1.0-py3-none-any.whl
Collecting pandas
Using cached pandas-1.3.4-cp39-cp39-macosx_10_9_x86_64.whl (11.6 MB)
…
Successfully installed dataprep-0.1.0 numpy-1.21.4 pandas-1.3.4 python-dateutil-2.8.2 pytz-2021.3 six-1.16.0
Anschließend könnt ihr die Wheel-Datei überprüfen mit:
$ mkdir test_env
$ cd !$
cd test_env
$ python3 -m venv .
$ source bin/activate
$ python -m pip install dist/dataprep-0.1.0-py3-none-any.whl
Processing ./dist/dataprep-0.1.0-py3-none-any.whl
Collecting pandas
Using cached pandas-1.3.4-cp39-cp39-macosx_10_9_x86_64.whl (11.6 MB)
…
Successfully installed dataprep-0.1.0 numpy-1.21.4 pandas-1.3.4 python-dateutil-2.8.2 pytz-2021.3 six-1.16.0
Then you can check the wheel with:
$ python -m pip install check-wheel-contents
$ check-wheel-contents dist/*.whl
dist/dataprep-0.1.0-py3-none-any.whl: OK
Alternatively, you can also install the package:
$ python -m pip install dist/dataprep-0.1.0-py3-none-any.whl
Processing ./dist/dataprep-0.1-py3-none-any.whl
Collecting pandas
…
Installing collected packages: numpy, pytz, six, python-dateutil, pandas, dataprep
Successfully installed dataprep-0.1 numpy-1.21.4 pandas-1.3.4 python-dateutil-2.8.2 pytz-2021.3 six-1.16.0
You can then call Python and import your loaders
module:
from dataprep import loaders
Note
There are still many instructions that include a step to call
setup.py
, for example python setup.py sdist
. However, this is
now considered anti-pattern by parts of the
Python Packaging Authority (PyPA).