Writing a Bazel rule set
This post will cover two things:
- How to run an arbitrary tool with Bazel (in this case, PlantUML, a tool to generate diagrams), by writing a rule set
- How to test this rule set.
It should be mentioned that while I was working on this rule set, it became more and more apparent PlantUML is not a great candidate for this kind of integration, as its output is platform-dependent (the font rendering). Despite that, it’s still a simple tool and as such its integration is simple, albeit not perfect (the rendering tests I wrote need to run on the same platform every time).
PlantUML usage
PlantUML is a tool that takes a text input looking like this:
@startuml
Alice -> Bob: SYN
@enduml
And outputs an image looking like this:
PlantUML has multiple way of being invoked (CLI, GUI, as well as a lot of integrations with different tools), but we’ll go with the easiest: a one-shot CLI invocation. It takes as inputs:
- A text file, representing a diagram
- An optional configuration file, giving control over the output
It then outputs a single image file, which can be of different formats (we’ll just cover SVG and PNG in this article, but adding support for other formats is trivial).
PlantUML ships as a JAR file, which needs to be run with Java. An invocation generating the sample image above would look like that:
java -jar plantuml.jar -tpng -p < 'mysource.puml' > 'dir/myoutput.png'
Pretty straightforward: run the JAR, with a single option for the image type,
pipe the content of the input file and get the output file back. The -p
flag
is the short form of -pipe
, which we’re using as using pipes is the only way
of properly controlling the output path (without that, PlantUML tries to be
smart and places the output next to the input).
With a configuration file:
java -jar plantuml.jar -tpng -config config.puml -p < 'mysource.puml' > 'dir/myoutput.png'
Simple enough, right? Well, not really. PlantUML actually integrates some metadata in the files it generates. For example, when generating an SVG:
<!-- The actual SVG image has been omitted, as this part is deterministic and
pretty long. -->
<svg><g>
<!--MD5=[8d4298e8c40046c92682b92efe1f786e]
@startuml
Alice -> Bob: SYN
@enduml
PlantUML version 1.2020.07(Sun Apr 19 21:42:40 AEST 2020)
(GPL source distribution)
Java Runtime: OpenJDK Runtime Environment
JVM: OpenJDK 64-Bit Server VM
Java Version: 11.0.6+10
Operating System: Linux
Default Encoding: UTF-8
Language: en
Country: AU
--></g></svg>
This makes PlantUML non hermetic by default (in addition to the fonts issue
mentioned earlier). While PlantUML has a simple way of working around that (in
the form of a -nometadata
flag), this is something to keep in mind when
integrating a tool with Bazel: is this tool usable in a hermetic way? If not,
how to minimise the impact of this non-hermeticity?
From there, here is the invocation we’ll work with:
java -jar plantuml.jar -tpng -nometadata -config config.puml \
-p < 'mysource.puml' > 'dir/myoutput.png'
Getting PlantUML
PlantUML is a Java application, available as a JAR on Maven. As such, it can be
fetched with the help of
rules_jvm_external, as was
explained in
a previous article.
The Maven rules will expose the JAR as a library, but we need a binary to be
able to run it. In e.g. //third_party/plantuml/BUILD
:
load("@rules_java//java:defs.bzl", "java_binary")
java_binary(
name = "plantuml",
main_class = "net.sourceforge.plantuml.Run",
visibility = ["//visibility:public"],
runtime_deps = [
"@maven//:net_sourceforge_plantuml_plantuml",
],
)
From there, we can use //third_party/plantuml
as any Bazel binary target - we
can run it with bazel run
, and we can pass it as a tool for rule actions.
This is a pattern that works well for any JVM-based tool. Other kinds of tools will need a different preparation step to make them available through Bazel - but as long as you can get a binary, you should be good.
Rule set structure
This rule set will follow the same structure we previously used for Ktlint:
- Based in
//tools/plantuml
- A public interface exposed in
//tools/plantuml/defs.bzl
- Internal actions definition in
//tools/plantuml/internal/actions.bzl
- Internal rule definition in
//tools/plantuml/internal/rules.bzl
But in addition:
- Tests for the actions in
//tools/plantuml/internal/actions_test.bzl
- Integration tests in
//tools/plantuml/tests
Let’s start by defining our actions.
Actions
Implementation
We need only one action for our rule: one that takes a source file, an optional
configuration file, the PlantUML binary, and emits the output file by calling
PlantUML. Let’s assume for a moment we have a helper function which, given the
proper input, returns the PlantUML command line to call, called
plantuml_command_line
, and write the action from there:
def plantuml_generate(ctx, src, format, config, out):
"""Generates a single PlantUML graph from a puml file.
Args:
ctx: analysis context.
src: source file to be read.
format: the output image format.
config: the configuration file. Optional.
out: output image file.
"""
command = plantuml_command_line(
executable = ctx.executable._plantuml_tool.path,
config = config.path if config else None,
src = src.path,
output = out.path,
output_format = format,
)
inputs = [src]
if config:
inputs.append(config)
ctx.actions.run_shell(
outputs = [out],
inputs = inputs,
tools = [ctx.executable._plantuml_tool],
command = command,
mnemonic = "PlantUML",
progress_message = "Generating %s" % out.basename,
)
This is pretty straightforward: we generate the command line, passing either the
attributes’ respective paths (or None
for the configuration file if it’s not
provided, since it’s optional), as well as the requested image format. We define
that both our source file and configuration files are inputs, and PlantUML is a
requested tool.
Now let’s implement our helper function. It’s there again really straightforward: it gets a bunch of paths as input, and needs to generate a command line call (in the form of a simple string) from them:
def plantuml_command_line(executable, config, src, output, output_format):
"""Formats the command line to call PlantUML with the given arguments.
Args:
executable: path to the PlantUML binary.
config: path to the configuration file. Optional.
src: path to the source file.
output: path to the output file.
output_format: image format of the output file.
Returns:
A command to invoke PlantUML
"""
command = "%s -nometadata -p -t%s " % (
shell.quote(executable),
output_format,
)
if config:
command += " -config %s " % shell.quote(config)
command += " < %s > %s" % (
shell.quote(src),
shell.quote(output),
)
return command
An interesting note is that because PlantUML is already integrated as an
executable Bazel target, we don’t care that it’s a JAR, a C++ binary or a shell
script: Bazel knows exactly what this executable is made of, how to prepare
(e.g. compile) it if necessary, its runtime dependencies (in this case, a JRE)
and, more importantly in this context, how to run it. We can treat our tool
target as a single executable file, and run it as such just from its path.
Bazel will automatically make sure to provide us with everything we need. (For
more details: the target actually points to a shell script generated by Bazel,
through the Java rules, which in the case of a java_binary
target is
responsible for defining the classpath, among other things. The JAR file is
merely a dependency of this shell script, and as such is provided as a runtime
dependency.)
Writing this as a helper function rather than directly in the action definition serves two purposes: not only does it make the whole thing slightly easier to read, but this function, which contains the logic (even though in this case it’s really simple), is easily testable: it takes only strings as arguments, and returns a string. It’s also a pure function: it doesn’t have any side effect, and as such it will always return the same output given the same set of inputs.
Tests
To test Starlark functions like this one, Bazel’s
Skylib provides a test framework
which, while requiring a bit of boilerplate, is pretty simple to use. In this
specific case, we only have two different cases to test: with and without
configuration file provided. Error cases should be unreachable due to the way
the rule will be defined: Bazel will be responsible for enforcing the presence
of an executable target for PlantUML’s binary, a valid image format… Let’s see
how that works. In //tools/plantuml/internal/actions_test.bzl
:
"""Unit tests for PlantUML action"""
load("@bazel_skylib//lib:unittest.bzl", "asserts", "unittest")
load(":actions.bzl", "plantuml_command_line")
def _no_config_impl(ctx):
env = unittest.begin(ctx)
asserts.equals(
env,
"'/bin/plantuml' -nometadata -p -tpng < 'mysource.puml' > 'dir/myoutput.png'",
plantuml_command_line(
executable = "/bin/plantuml",
config = None,
src = "mysource.puml",
output = "dir/myoutput.png",
output_format = "png",
),
)
return unittest.end(env)
no_config_test = unittest.make(_no_config_impl)
def _with_config_impl(ctx):
env = unittest.begin(ctx)
asserts.equals(
env,
"'/bin/plantuml' -nometadata -p -tpng -config 'myskin.skin' < 'mysource.puml' > 'dir/myoutput.png'",
plantuml_command_line(
executable = "/bin/plantuml",
config = "myskin.skin",
src = "mysource.puml",
output = "dir/myoutput.png",
output_format = "png",
),
)
return unittest.end(env)
with_config_test = unittest.make(_with_config_impl)
def actions_test_suite():
unittest.suite(
"actions_tests",
no_config_test,
with_config_test,
)
First, we define two functions, which are the actual test logic:
_no_config_impl
and _with_config_impl
. Their content is pretty simple: we
start a unit test environment, we invoke our test function and assert that the
result is indeed what we expected, and we close the unit test environment. The
return value is needed by the test framework, as it’s what carries what
assertions passed or failed.
Next, we declare those two functions as actual unit tests, wrapping them with a
call to unittest.make
. We can then add those two test targets to a test suite,
which is what actually generates a test target when invoked. Which means that
this macro needs to be invoked, in the BUILD
file:
load(":actions_test.bzl", "actions_test_suite")
actions_test_suite()
We can run our tests, and hopefully everything should pass:
$ bazel test //tools/plantuml/internal:actions_tests
INFO: Invocation ID: 112bd049-7398-4b23-b62b-1398e9731eb7
INFO: Analyzed 2 targets (5 packages loaded, 927 targets configured).
INFO: Found 2 test targets...
INFO: Elapsed time: 0.238s, Critical Path: 0.00s
INFO: 0 processes.
//tools/plantuml/internal:actions_tests_test_0 PASSED in 0.4s
//tools/plantuml/internal:actions_tests_test_1 PASSED in 0.3s
Executed 0 out of 2 tests: 2 tests pass.
INFO: Build completed successfully, 1 total action
Rules definition
Similarly as the actions definition, we only have one rule to define here. Let’s
call it plantuml_graph()
. It needs our usual set of inputs, and outputs a
single file, which name will be ${target_name}.{image_format}
. It’s also where
we define the set of acceptable image formats, the fact that the input file is
mandatory but the configuration file optional, and the actual executable target
to use for PlantUML. The only thing we actually do is, as expected, calling our
plantuml_generate
action defined above.
load(
":actions.bzl",
"plantuml_generate",
)
def _plantuml_graph_impl(ctx):
output = ctx.actions.declare_file("{name}.{format}".format(
name = ctx.label.name,
format = ctx.attr.format,
))
plantuml_generate(
ctx,
src = ctx.file.src,
format = ctx.attr.format,
config = ctx.file.config,
out = output,
)
return [DefaultInfo(
files = depset([output]),
)]
plantuml_graph = rule(
_plantuml_graph_impl,
attrs = {
"config": attr.label(
doc = "Configuration file to pass to PlantUML. Useful to tweak the skin",
allow_single_file = True,
),
"format": attr.string(
doc = "Output image format",
default = "png",
values = ["png", "svg"],
),
"src": attr.label(
allow_single_file = [".puml"],
doc = "Source file to generate the graph from",
mandatory = True,
),
"_plantuml_tool": attr.label(
default = "//third_party/plantuml",
executable = True,
cfg = "host",
),
},
outputs = {
"graph": "%{name}.%{format}",
},
doc = "Generates a PlantUML graph from a puml file",
)
Public interface
As we only have a single rule, and nothing else specific to do, the public interface is dead simple:
load("//tools/plantuml/internal:rules.bzl", _plantuml_graph = "plantuml_graph")
plantuml_graph = _plantuml_graph
You might then be wondering: why is this useful, and why shouldn’t I just import
the rule definition from //tools/plantuml/internal:rules.bzl
directly? Having
this kind of public interface allows you to tweak the actual rule definition
without breaking any consumer site, as long as you respect the public interface.
You can also add features to every consumer site in a really simple way. Let’s
imagine for example that you have a view_image
rule which, given an image
file, generates a script to view it, you could then transform your public
interface like this:
load("//tools/plantuml/internal:rules.bzl", _plantuml_graph = "plantuml_graph")
load("//tools/utils:defs.bzl", _view_image = "view_image")
def plantuml_graph(name, src, config, format):
_plantuml_graph(
name = name,
src = src,
config = config,
format = format,
)
_view_image(
name = "%s.view" % name,
src = ":%s.%s" % (name, format),
)
And suddenly, all your PlantUML graphs have an implicit .view
target defined
automatically, allowing you to see the output directly without having to dig in
Bazel’s output directories.
A set of Bazel rules for LaTeX actually provides such a feature to view the PDF
output: they have a
view_pdf.sh
script,
used by their main
latex_document
macro.
Further testing
For a rule this simple, I took just a simple further step: having a few reference PlantUML graphs, as well as their expected rendered output, which I compare through Phosphorus, a really simple tool I wrote to help compare two images, covered in the previous article (I told you it would be useful!). But for more complex cases, Skylib offer more utilities like an analysis test, and a build test.
Closing thoughts
While writing this kind of tools might look like a lot of works, it’s actually pretty mechanical for a lot of cases. I worked on a few others like markdownlint, which now runs on all my Markdown files as regular Bazel test targets, or pngcrush, which is ran on the PNG files hosted on this blog. In a monorepo, writing such a rule is the kind of task that you do once, and it just keeps on giving - you can easily compose different rules with a main use-case, with a bunch of test targets generated for virtually free.
On another note, I’m aware that having all this in a public repository would make things much simpler to follow. Sadly, it’s part of a larger mono-repository which makes open-sourcing only the relevant parts tricky. Dumping a snapshot somewhere would be an option, but I’d rather have an actual living repository.
Now that we have all the tools we need (that was kind of convoluted, I’ll give you that), there are only two steps left to cover:
- Generating the actual blog (ironically enough, this will be a really quick step, despite being the only really important one)
- Managing the deployment.
We’re getting there!
This is a post in the Creating a blog with Bazel series.
Other posts in this series:
- 16 May 2020 - Writing a Bazel rule set (this article)
- 8 December 2019 - Compiling a Kotlin application with Bazel
- 2 November 2019 - Why Bazel?
- 31 October 2019 - A new beginning