| src | Deprecates FixedStateTarget.onDependencyChange for FileTarget.aliasTarget | |
| .gitignore | Improves file state hashing and adds some helpers | |
| .npmrc | Removes magic buma script and starts publishing just buma.jar to codeberg | |
| buma | Adds javadocstyle, FileTarget.pathString() and renames mach to buma | |
| dependencies.svg | Improves file state hashing and adds some helpers | |
| README.md | Add pointer to bashbuilder | |
A Generic Build System
I finally gave in and "rewrote" this in
bash. OH, YES, I know what you think, but I think the same. But still, the usability is actually better: bashbuilder. Though I must say that it actually helped to design the concepts in Java first. Repeating them inbashwas easy. Without the Java version, thebashimplementation would have not been so clean.
No Magic: you are actually required and will be able to understand what is going on: an empty build file does nothing 😀 and as you add targets, it starts to do things.
No new specification syntax: plain Java.
Quick Links
- example: Buma.java used for my blog or Buma.java used for this project.
- theory: see below
- api documentation (javadoc)
- see codeberg for available packages to download.
How to use?
To use it in a project, you may write any Java file, for example
src/buma/Buma.java using the classes
Build or Machine (see api documentation above) and run it like
java -cp buma-<version>.jar src/buma/Buma.java
The build script, Buma.java, should be a classic Java main program,
which makes use of functionality provided by this software to create
targets with dependencies and tasks to create them. When using
Build, a minimal skeletton looks like:
Buildbuild=newBuild(...);Build.Cmdlinecmdline=build.parseCmdline(argv);// create targets and add those to build which shall be callable from// the command linebuild.updateAll(cmdline.requestedTargets());If you don't like Build and rather parse the command line
differently, set up targets with their dependencies and ultimately
just run
newMachine().update(someTarget)How to build?
Since this is supposed to be a build system, it would be weird to use
yet another build system to use it. There is currently only a bash
script. Use
./mach
to create the jar build/build/buma-*.jar. The script compiles the
initial classes and then uses src/buma/Buma.java to do the rest. It
may serve as a first example build script. To see all targets with
their dependencies, use
./mach dependencytree | dot -T pdf >dependencies.pdf
given you have graphviz installed (for the dot command).
Open Ends
- Best way to add functionality for downloading libraries with their transitive dependencies. Maybe the best way is not to add this at all, because given the simplicity of BuildMachine, just add some resolver library or tool and use it. For example: jresolve-cli
Theory
There are not enough build systems in the world yet: make, cmake, ant, maven, npm, gradle are only those that I used at some time and there are even more. But I thought, hey, lets implement one more :-)
I always wondered what the core functionality of a build system is after having removed all the language specifics that crept in because the inventors wanted to support their specific ecosystem, in particular the programming language of choice.
Many built systems these days have built-in functionality to resolve and download the directed acyclic graph of library dependencies. But this is not what I consider should be part of a built system core. Because, read on!
What is the requirement?
- Given some artifacts, typically files containing source code (think C, Java, Python, Typescript) plus additional files containing more source (think properties files, HTML files),
- the build system shall perform tasks like getting required additional files (think libraries), compilation, wrapping, mixing, munching, zipping, dissecting, uploading and whatnot to create a final product.
On the most abstract level this boils down to performing several tasks in some order, which is what a linear script could do. But this tends to repeat tasks uneccessarily.
Some theory
What exactly does it mean for a task to be unnecessary? Let's have some nomenclature and definitions.
- Call a target something that needs to be created by a build. Most of the time this means that a file or a set of files shall appear or be updated in some directory, local or remote. It could be a row in a database too, which typically also means that a file is updated, the database storage file.
- Call the computation to create or update the target a task. Typically, it is one or more commands run on the operating system, but may of course include computations built into the build system.
- The task to create a target often needs other targets as input. We call these the dependencies of the target.
- Call a target without a task to build it an initial target (think source code edited by the developer).
For brevity, if A is a target, let T_A be the task that creates it
and let d(A) be the set of dependencies. The corner case of an empty
set d(A) shall be included. Then we can write A = T_A(d(A)).
Now we can formulate the following definition:
- Updating a target
Ainvolves- checking whether it is out-of-date and
- if yes, running
T_A(d(A)).
- A target is out-of-date, if, after updating all targets in
d(A), runningT_A(d(A))would change it.
Note how the definition recurses into the dependencies. What does it mean, though, for the ground case of an initial target? No task, nothing to do.
Now consider target A with a single dependency B, an initial
target. T_A must be run if it would change A. There is
no way to check whether running T_A would change A other than
actually running it. But this is exactly what we want to avoid doing
unnecessarily. So we approximate as follows:
- We assume that the output
T_A(d(A))=T_A({B})depends onBalone and is deterministic, meaning the result can only change ifBchanged since we last ranT_A. - We accept that we might run
T_Aon a changedBeven if it produces the excact same output than from the originalB.
So for an initial target B, we must know whether it was changed
since we last looked, because that would be our trigger to run T_A.
Interestingly, the same is true for all targets. If we ran T_A to
create A and then A was changed independently of the build
system, either by manually tampering with or deleting it, then it is
not enough to check changes of dependencies. We must be able to detect
that A changed "spuriously" since we last ran the build.
BuildMachine works with a
StateProvider
and asks it about some bytes which describe the state of a target at
some point in time. This could be, for example
- the whole target content (think file or database query result),
- a hash of the whole target content,
- the last modification time of the target content.
For the latter two, FileContentState is provided.
Under normal circumstances (non-degenerate hash, no time stamp tampering), all of these change if the content of a target is changed.
Implementation overview
That said, BuildMachine performs a rather simple algorithm to update
a target A with dependencies dep(A) which can be re-created with
task T_A. (See source code).
- Recursively do the following for all dependencies in
d(A). - Compute the combined state of the dependencies and the current
state of
Aand check if it is the same as was saved at the last visit. - If it is not the same:
- Run
T_A. - Compute the combined state like in 2. again and save it. Note
that this does not require to compute the states of the
dependencies, as these are kept, of course. Only the state of
Aindividually is recomputed.
- Run
Target states are saved in a directory provided to either the
Machine constructor
or to the
Build constructor.
How is this a generic build system?
Nothing of the above relates to any specific programming language. In fact the algorithm can be implemented in every programming language to build artifacts of any other programming language or perform tasks not even related to compilation.
Where is the dependency management that maven, gradle or npm provide? Well, right there: a library needed to create a program is a target and determining the correct version and fetching it whichever way is a task. Implementations are left as an exercise :-)
Questions Never Asked
I could call this FAQ, but BuildMachine is not famous enough to talk about "frequently" :-)
How can I have a task which is always run, unconditionally.
Check out FixedStateTarget.runAlways().
How can I parametrize a task to run differently for a development or production build?
We said above that a task must be deterministic and, given identical dependencies, its result, the target, must come out identical. This means that if you use, say, a system property as a parameter, it must, at the same time, be declared as a dependency of the target being generated. Use StringTarget, which allows to create dependencies from properties and environment variables.
Can I have a task for a dependency run differently depending on the ultimate target?
It is difficult to put the question into one sentence. Consider two
targets, package and build both depending on junit. The latter
contains some long running tests which shall only be run on demand or
when package is called, but not when build is called during
development. So when target package is considered, it should set a
flag such that its dependency junit is build differently from when
it is called via target build. Can this be done?
The answer is no, and intentionally. The number one reason is, to be honest, that when I tried to implement this, the simple and straight forward recursive build algorithm started to look weird and bloated.
Yet, there is an easy way out: look at your code which creates the
junit target and parametrize it such that you can create, say, a
junit_package and a junit_build target and use these as the
dependencies as needed. Alternatively, use methods like
FileTarget.buildWith()
to "clone" one target from another with different parametrization to
get the junit targets you need.