Make-like libraries and tools for Python

I’ve been searching for a Python library that would allow me to structure build tasks like GNU Make does, however, without much success. My objective is not to write a replacement for Make, but to integrate this process with the rest of my Python-based tools.

Furthermore, I was looking for an object-oriented, extensible and customizable API, that would let me specify targets, requirements and “up to date” definitions arbitrarily.

This is just a list of tools that I have found and a short overview.

  • Dask: “Dask is a flexible library for parallel computing in Python.”
  • Pyflow: “…tool to manage tasks in the context of a task dependency graph”
  • Celery: “Distributed Task Queue”
  • Luigi: “…helps you build complex pipelines of batch jobs”
  • Taskgraph: “A task graph execution framework for Python”
  • DoIt: “…bringing the power of build-tools to execute any kind of task”

The bottom line is that there is no such thing close to what I want. The project that is closest to my wish list is DoIt. DoIt is written in Python an lets you specify target, requirements and “up to date” conditions rather freely, in Python. However, it was designed as a Make-replacement command-line tool. It is intimately tied to command-line arguments, expects to find certain files in the current folder, and so on. My attempts to dig into the source and use it as a library were futile.

This lead me to that painful moment when you realize that you need to write your tool yourself, and that it is not going to be that easy. An open source endeavor begins!

Let us begin with the wish list in the form of an API use case:

builder = Builder(
    TSTask(
        "taskA",
        CmdAction('wc -l b{m}.txt > a{m}.txt'),
        FileTarget('a{m}.txt'),
        FileDependency('b{m}.txt'),
        TaskParam('m')
    ),

    TSTask(
        "taskB",
        CmdAction('tail -n {m} /var/log/syslog > b{m}.txt'),
        FileTarget('b{m}.txt'),
        TaskParam('m')
    )
)

builder['taskA'].run()

An instance of the Builder class encapsulates a whole project. Its constructor takes instances of Task class as parameters . In the specific example Task is sub-classed into TSTask (A Task that uses time-stamps to determine what is “up to date”, like Make). Tasks are defined as a collection of instances of Action (what the task does), Target (generated output) and Dependency (requirements).

Unfortunately, making this work and keeping the API clean, extensible and elegant may not be so simple. At least I got it started, it’s called pybuild and I’m hoping for feedback!