Writing a Bazel rule set

This post will cover two things:

How to run an arbitrary tool with Bazel (in this case, PlantUML, a tool to generate diagrams), by writing a rule set
How to test this rule set.

It should be mentioned that while I was working on this rule set, it became more and more apparent PlantUML is not a great candidate for this kind of integration, as its output is platform-dependent (the font rendering). Despite that, it’s still a simple tool and as such its integration is simple, albeit not perfect (the rendering tests I wrote need to run on the same platform every time).

PlantUML usage

PlantUML is a tool that takes a text input looking like this:

@startuml
Alice -> Bob: SYN
@enduml

And outputs an image looking like this:

PlantUML has multiple way of being invoked (CLI, GUI, as well as a lot of integrations with different tools), but we’ll go with the easiest: a one-shot CLI invocation. It takes as inputs:

A text file, representing a diagram
An optional configuration file, giving control over the output

It then outputs a single image file, which can be of different formats (we’ll just cover SVG and PNG in this article, but adding support for other formats is trivial).

PlantUML ships as a JAR file, which needs to be run with Java. An invocation generating the sample image above would look like that:

java -jar plantuml.jar -tpng -p < 'mysource.puml' > 'dir/myoutput.png'

Pretty straightforward: run the JAR, with a single option for the image type, pipe the content of the input file and get the output file back. The -p flag is the short form of -pipe, which we’re using as using pipes is the only way of properly controlling the output path (without that, PlantUML tries to be smart and places the output next to the input).

With a configuration file:

java -jar plantuml.jar -tpng -config config.puml -p < 'mysource.puml' > 'dir/myoutput.png'

Simple enough, right? Well, not really. PlantUML actually integrates some metadata in the files it generates. For example, when generating an SVG:

<!-- The actual SVG image has been omitted, as this part is deterministic and
pretty long. -->

<svg><g>
<!--MD5=[8d4298e8c40046c92682b92efe1f786e]
@startuml
Alice -> Bob: SYN
@enduml

PlantUML version 1.2020.07(Sun Apr 19 21:42:40 AEST 2020)
(GPL source distribution)
Java Runtime: OpenJDK Runtime Environment
JVM: OpenJDK 64-Bit Server VM
Java Version: 11.0.6+10
Operating System: Linux
Default Encoding: UTF-8
Language: en
Country: AU
--></g></svg>

This makes PlantUML non hermetic by default (in addition to the fonts issue mentioned earlier). While PlantUML has a simple way of working around that (in the form of a -nometadata flag), this is something to keep in mind when integrating a tool with Bazel: is this tool usable in a hermetic way? If not, how to minimise the impact of this non-hermeticity?

From there, here is the invocation we’ll work with:

java -jar plantuml.jar -tpng -nometadata -config config.puml \
  -p < 'mysource.puml' > 'dir/myoutput.png'

Getting PlantUML

PlantUML is a Java application, available as a JAR on Maven. As such, it can be fetched with the help of rules_jvm_external, as was explained in a previous article. The Maven rules will expose the JAR as a library, but we need a binary to be able to run it. In e.g. //third_party/plantuml/BUILD:

load("@rules_java//java:defs.bzl", "java_binary")

java_binary(
    name = "plantuml",
    main_class = "net.sourceforge.plantuml.Run",
    visibility = ["//visibility:public"],
    runtime_deps = [
        "@maven//:net_sourceforge_plantuml_plantuml",
    ],
)

From there, we can use //third_party/plantuml as any Bazel binary target - we can run it with bazel run, and we can pass it as a tool for rule actions.

This is a pattern that works well for any JVM-based tool. Other kinds of tools will need a different preparation step to make them available through Bazel - but as long as you can get a binary, you should be good.

Rule set structure

This rule set will follow the same structure we previously used for Ktlint:

Based in //tools/plantuml
A public interface exposed in //tools/plantuml/defs.bzl
Internal actions definition in //tools/plantuml/internal/actions.bzl
Internal rule definition in //tools/plantuml/internal/rules.bzl

But in addition:

Tests for the actions in //tools/plantuml/internal/actions_test.bzl
Integration tests in //tools/plantuml/tests

Let’s start by defining our actions.

Actions

Implementation

We need only one action for our rule: one that takes a source file, an optional configuration file, the PlantUML binary, and emits the output file by calling PlantUML. Let’s assume for a moment we have a helper function which, given the proper input, returns the PlantUML command line to call, called plantuml_command_line, and write the action from there:

def plantuml_generate(ctx, src, format, config, out):
    """Generates a single PlantUML graph from a puml file.

    Args:
        ctx: analysis context.
        src: source file to be read.
        format: the output image format.
        config: the configuration file. Optional.
        out: output image file.
    """
    command = plantuml_command_line(
        executable = ctx.executable._plantuml_tool.path,
        config = config.path if config else None,
        src = src.path,
        output = out.path,
        output_format = format,
    )

    inputs = [src]

    if config:
        inputs.append(config)

    ctx.actions.run_shell(
        outputs = [out],
        inputs = inputs,
        tools = [ctx.executable._plantuml_tool],
        command = command,
        mnemonic = "PlantUML",
        progress_message = "Generating %s" % out.basename,
    )

This is pretty straightforward: we generate the command line, passing either the attributes’ respective paths (or None for the configuration file if it’s not provided, since it’s optional), as well as the requested image format. We define that both our source file and configuration files are inputs, and PlantUML is a requested tool.

Now let’s implement our helper function. It’s there again really straightforward: it gets a bunch of paths as input, and needs to generate a command line call (in the form of a simple string) from them:

def plantuml_command_line(executable, config, src, output, output_format):
    """Formats the command line to call PlantUML with the given arguments.

    Args:
        executable: path to the PlantUML binary.
        config: path to the configuration file. Optional.
        src: path to the source file.
        output: path to the output file.
        output_format: image format of the output file.

    Returns:
        A command to invoke PlantUML
    """

    command = "%s -nometadata -p -t%s " % (
        shell.quote(executable),
        output_format,
    )

    if config:
        command += " -config %s " % shell.quote(config)

    command += " < %s > %s" % (
        shell.quote(src),
        shell.quote(output),
    )

    return command

An interesting note is that because PlantUML is already integrated as an executable Bazel target, we don’t care that it’s a JAR, a C++ binary or a shell script: Bazel knows exactly what this executable is made of, how to prepare (e.g. compile) it if necessary, its runtime dependencies (in this case, a JRE) and, more importantly in this context, how to run it. We can treat our tool target as a single executable file, and run it as such just from its path. Bazel will automatically make sure to provide us with everything we need. (For more details: the target actually points to a shell script generated by Bazel, through the Java rules, which in the case of a java_binary target is responsible for defining the classpath, among other things. The JAR file is merely a dependency of this shell script, and as such is provided as a runtime dependency.)

Writing this as a helper function rather than directly in the action definition serves two purposes: not only does it make the whole thing slightly easier to read, but this function, which contains the logic (even though in this case it’s really simple), is easily testable: it takes only strings as arguments, and returns a string. It’s also a pure function: it doesn’t have any side effect, and as such it will always return the same output given the same set of inputs.

Tests

To test Starlark functions like this one, Bazel’s Skylib provides a test framework which, while requiring a bit of boilerplate, is pretty simple to use. In this specific case, we only have two different cases to test: with and without configuration file provided. Error cases should be unreachable due to the way the rule will be defined: Bazel will be responsible for enforcing the presence of an executable target for PlantUML’s binary, a valid image format… Let’s see how that works. In //tools/plantuml/internal/actions_test.bzl:

"""Unit tests for PlantUML action"""

load("@bazel_skylib//lib:unittest.bzl", "asserts", "unittest")
load(":actions.bzl", "plantuml_command_line")

def _no_config_impl(ctx):
    env = unittest.begin(ctx)
    asserts.equals(
        env,
        "'/bin/plantuml' -nometadata -p -tpng  < 'mysource.puml' > 'dir/myoutput.png'",
        plantuml_command_line(
            executable = "/bin/plantuml",
            config = None,
            src = "mysource.puml",
            output = "dir/myoutput.png",
            output_format = "png",
        ),
    )
    return unittest.end(env)

no_config_test = unittest.make(_no_config_impl)

def _with_config_impl(ctx):
    env = unittest.begin(ctx)
    asserts.equals(
        env,
        "'/bin/plantuml' -nometadata -p -tpng  -config 'myskin.skin'  < 'mysource.puml' > 'dir/myoutput.png'",
        plantuml_command_line(
            executable = "/bin/plantuml",
            config = "myskin.skin",
            src = "mysource.puml",
            output = "dir/myoutput.png",
            output_format = "png",
        ),
    )
    return unittest.end(env)

with_config_test = unittest.make(_with_config_impl)

def actions_test_suite():
    unittest.suite(
        "actions_tests",
        no_config_test,
        with_config_test,
    )

First, we define two functions, which are the actual test logic: _no_config_impl and _with_config_impl. Their content is pretty simple: we start a unit test environment, we invoke our test function and assert that the result is indeed what we expected, and we close the unit test environment. The return value is needed by the test framework, as it’s what carries what assertions passed or failed.

Next, we declare those two functions as actual unit tests, wrapping them with a call to unittest.make. We can then add those two test targets to a test suite, which is what actually generates a test target when invoked. Which means that this macro needs to be invoked, in the BUILD file:

load(":actions_test.bzl", "actions_test_suite")

actions_test_suite()

We can run our tests, and hopefully everything should pass:

$ bazel test //tools/plantuml/internal:actions_tests
INFO: Invocation ID: 112bd049-7398-4b23-b62b-1398e9731eb7
INFO: Analyzed 2 targets (5 packages loaded, 927 targets configured).
INFO: Found 2 test targets...
INFO: Elapsed time: 0.238s, Critical Path: 0.00s
INFO: 0 processes.
//tools/plantuml/internal:actions_tests_test_0                           PASSED in 0.4s
//tools/plantuml/internal:actions_tests_test_1                           PASSED in 0.3s

Executed 0 out of 2 tests: 2 tests pass.
INFO: Build completed successfully, 1 total action

Rules definition

Similarly as the actions definition, we only have one rule to define here. Let’s call it plantuml_graph(). It needs our usual set of inputs, and outputs a single file, which name will be ${target_name}.{image_format}. It’s also where we define the set of acceptable image formats, the fact that the input file is mandatory but the configuration file optional, and the actual executable target to use for PlantUML. The only thing we actually do is, as expected, calling our plantuml_generate action defined above.

load(
    ":actions.bzl",
    "plantuml_generate",
)

def _plantuml_graph_impl(ctx):
    output = ctx.actions.declare_file("{name}.{format}".format(
        name = ctx.label.name,
        format = ctx.attr.format,
    ))
    plantuml_generate(
        ctx,
        src = ctx.file.src,
        format = ctx.attr.format,
        config = ctx.file.config,
        out = output,
    )

    return [DefaultInfo(
        files = depset([output]),
    )]

plantuml_graph = rule(
    _plantuml_graph_impl,
    attrs = {
        "config": attr.label(
            doc = "Configuration file to pass to PlantUML. Useful to tweak the skin",
            allow_single_file = True,
        ),
        "format": attr.string(
            doc = "Output image format",
            default = "png",
            values = ["png", "svg"],
        ),
        "src": attr.label(
            allow_single_file = [".puml"],
            doc = "Source file to generate the graph from",
            mandatory = True,
        ),
        "_plantuml_tool": attr.label(
            default = "//third_party/plantuml",
            executable = True,
            cfg = "host",
        ),
    },
    outputs = {
        "graph": "%{name}.%{format}",
    },
    doc = "Generates a PlantUML graph from a puml file",
)

Public interface

As we only have a single rule, and nothing else specific to do, the public interface is dead simple:

load("//tools/plantuml/internal:rules.bzl", _plantuml_graph = "plantuml_graph")

plantuml_graph = _plantuml_graph

You might then be wondering: why is this useful, and why shouldn’t I just import the rule definition from //tools/plantuml/internal:rules.bzl directly? Having this kind of public interface allows you to tweak the actual rule definition without breaking any consumer site, as long as you respect the public interface. You can also add features to every consumer site in a really simple way. Let’s imagine for example that you have a view_image rule which, given an image file, generates a script to view it, you could then transform your public interface like this:

load("//tools/plantuml/internal:rules.bzl", _plantuml_graph = "plantuml_graph")
load("//tools/utils:defs.bzl", _view_image = "view_image")

def plantuml_graph(name, src, config, format):
    _plantuml_graph(
        name = name,
        src = src,
        config = config,
        format = format,
    )

    _view_image(
        name = "%s.view" % name,
        src = ":%s.%s" % (name, format),
    )

And suddenly, all your PlantUML graphs have an implicit .view target defined automatically, allowing you to see the output directly without having to dig in Bazel’s output directories.

A set of Bazel rules for LaTeX actually provides such a feature to view the PDF output: they have a view_pdf.sh script, used by their main latex_document macro.

Further testing

For a rule this simple, I took just a simple further step: having a few reference PlantUML graphs, as well as their expected rendered output, which I compare through Phosphorus, a really simple tool I wrote to help compare two images, covered in the previous article (I told you it would be useful!). But for more complex cases, Skylib offer more utilities like an analysis test, and a build test.

Closing thoughts

While writing this kind of tools might look like a lot of works, it’s actually pretty mechanical for a lot of cases. I worked on a few others like markdownlint, which now runs on all my Markdown files as regular Bazel test targets, or pngcrush, which is ran on the PNG files hosted on this blog. In a monorepo, writing such a rule is the kind of task that you do once, and it just keeps on giving - you can easily compose different rules with a main use-case, with a bunch of test targets generated for virtually free.

On another note, I’m aware that having all this in a public repository would make things much simpler to follow. Sadly, it’s part of a larger mono-repository which makes open-sourcing only the relevant parts tricky. Dumping a snapshot somewhere would be an option, but I’d rather have an actual living repository.

Now that we have all the tools we need (that was kind of convoluted, I’ll give you that), there are only two steps left to cover:

Generating the actual blog (ironically enough, this will be a really quick step, despite being the only really important one)
Managing the deployment.

We’re getting there!

This is a post in the Creating a blog with Bazel series.
Other posts in this series:

enoent.fr