Skip to content

Retooling Python builds

The landscape of Python build infrastructure is a mess. I made over 100 commits to get a sensible, elegant, and secure build. It took too long, but here’s the result: template repository

Compared to other languages

In 2014 and 2015, I was mostly writing in Java and Scala. In Java, there are two widely used build tools: Maven and Gradle. They’re compatible and use the same repository. To build your code, run mvn package or gradle build. To add a dependency, add to build.gradle or pom.xml. To test, run mvn test or gradle test. In Scala, there’s just SBT. To compile your code, run sbt build. To add a dependency, add to build.sbt. To test, run sbt test.

The state of Python build

But in Python, there’s setuptools, pip, virtualenv, pipenv, Poetry, and Conda. There are wheels and eggs. There are setup.py, setup.cfg, pyproject.toml, requirements.txt, Piplock, meta.yaml, environment.yml, and poetry.lock — any of which can list dependencies. tox.ini can contain its own dependencies.

More importantly, the traditional tools have serious problems. To start, a setup.py can contain arbitrary code. This is a serious security problem and makes it impossible to get some metadata from a package before installing it.

Suppose you’re writing a project to show a dependency tree. Well, you can’t.

Pip

This is not how any reasonable modern build tool should operate. Moreover, Pip does not resolve dependencies or identify conflicts. A requirements.txt is especially bad because it’s order‐dependent and might be run in parallel, making it stochastic.

Conda

Conda came after Pip. In contrast to pip/setuptools, Anaconda/Conda performed dependency resolution, specified package metadata, easily linked C/C++, and tailored to scientific computing, drawing a specialized audience. But Conda has multiple channels, lacks many packages, fights with pip, throws false positives about dependency conflicts, and can take literally hours to balk on a large dependency graph. (In fairness, Poetry’s resolver can be slow due to a flaw in PyPi.)

Conda can resolve dependencies correctly. Unfortunately, many packages are not on Anaconda or Conda-Forge, and some recipes are unmaintained and out‐of‐date. Anaconda can corrupt itself. I just wrote a StackOverflow answer regarding this.

Wheels, pipx, Poetry, and Hatch

Fast-forward to 2023: There are wheels. And Poetry, Hatch, pipx, and even pip perform dependency resolution. Poetry, in particular, is way faster than Conda, is more clear about dependencies, doesn’t seem to throw false positives, and is friendlier to use.

What are wheels?

Python wheels are prebuilt, platform-specific packages which are distributed through PyPi. They can be statically linked with compiled C/C++ libraries. Rdkit was a hold-out, distributing prebuilt packages only for Conda. They now provide wheels under rdkit-pypi.

TL;DR: Stop using Conda. If needed, install mambaforge. See this Mambaforge setup guide

A nice build

It took me an unacceptable number of hours and commits. I’d change, push, wait for CI failures, and repeat. I made the repo public to be used as a template, so that others can avoid that special level of hell.

It uses Hatch, uv, and GitHub actions. It doesn’t contain a setup.py or setup.cfg.

When you push, it builds wheels, sdists, and a Docker image, and runs tests. It lints on commit using pre-commit. When you tag on GitHub, it publishes to PyPi and Docker Hub.