Writing build rules
Please is designed to support interleaving custom build commands as first-class citizens in the system. This describes some of the concepts to bear in mind when writing rules of your own.
The documentation for genrule() and gentest() may be of use as well, as that explains the available arguments and what they do.
Word count example
Here is an example of a build rule. It takes in
example/file.txt
from the source tree and counts
the number of words within:
# //example:word_count
genrule(
name = "word_count",
srcs = ["file.txt"],
outs = ["file.wordcount"],
cmd = "wc $SRCS > $OUT",
)
Please sets up $SRCS
and
$OUT
based on the parameters we passed to
genrule()
, then executes our command. When this
rule builds, it will output
plz-out/gen/example/file.wordcount
. For more
information on how this works, see the sources and outputs sections below.
Build location
A core concept of Please is to try to isolate execution so we have a good
concept of correctness for each action. To this end each rule builds in its
own temporary directory (under plz-out/tmp
)
containing only files defined as inputs to the rule.
This means that you can't access any other files in your source tree from
within the rule. If you want to have a look at what it looks like in there,
plz build --shell
will prepare a target for
building and open up a shell into the directory, where you can look around
and try running commands by hand.
Some aspects of this environment are different to normal - for example rules are given a small, deterministic set of environment variables. They shouldn't need to read anything outside this directory or not described by those variables. See the Passing environment variables section of this page for more information on passing variables from the host machine to the rule.
Sources
Sources are the inputs to your rule - typically source code, but like any
other inputs, they can also be build labels. This is defined in the
srcs
argument.
All sources are linked into the build directory - for performance reasons they aren't copied so you shouldn't modify them in-place.
Sources can be both files and directories, however it can be useful to use globs for sources e.g. to include all files with a given extension. See the built ins for more information on this.
Sources can be referred through the
$SRCS
environment variable. If there's only one,
$SRC
will be defined too.
Sources can also be named, e.g.
srcs = {
"srcs": ["my_source.py"],
"resources": ["something.txt"],
},
In that case they can be accessed separately through
$SRCS_SRCS
and
$SRCS_RESOURCES
.
Outputs
Rules must define their outputs explicitly. Only these files will be saved to plz-out and made available to other rules.
The output location within plz-out depends on whether the rule is marked as
binary. Binary rules output to plz-out/bin/...
and
non-binary rules output to plz-out/gen/...
.
Like sources, outputs can be both files and directories. The simplest way to define outputs on a rule is as a list:
outs = [
"path/to/a/directory",
"foo.a",
],
The $OUT
and
$OUTS
variables will be set much like
sources.
Named outputs
Much like sources, outs
can also be a
dictionary. In this case, the outputs are considered named outputs:
genrule(
name = "my_rule",
...
outs = {
"src": ["src"],
"lib": ["lib"],
},
)
The variables $OUTS_SRC
and
$OUTS_LIB
will be set. These outputs can then be
depended on individually from other rules as sources only:
genrule(
...
srcs = [":my_rule|srcs"],
...
)
Output directories
Sometimes it can be hard to reason about the exact outputs of a rule before actually building it. Output directories allow you to specify a folder; anything that ends up in that folder will become an output of the rule. The folder's path will not constitute part of the output path of the file.
Imagine we're generating code. The code generation tool parses some sources and based on the symbols it finds, produces a set of outputs e.g. mock generation:
genrule(
name = "mocks",
srcs = ["interfaces.go"],
tools = {
"mockgen": ["mockgen"],
},
cmd = "$TOOLS_MOCKGEN ... $SRCS -o $OUT"
outs = ???,
)
It's difficult to know ahead of time which files mockgen is going to
generate. It's dependent on the source code within
interfaces.go
. We can use output directories for
this:
genrule(
name = "mocks",
srcs = ["interfaces.go"],
tools = {
"mockgen": ["mockgen"],
},
cmd = "mkdir _out && $TOOLS_MOCKGEN ... $SRCS -o _out"
output_dirs = ["_out"],
)
If mockgen generates foo_mocks.go
and
bar/bar_mocks.go
, these will become outputs of
the rules as if they were defines as part of the
outs
list:
Build finished; total time 230ms, incrementality 100.0%. Outputs:
//:mocks:
plz-out/gen/foo_mocks.go
plz-out/gen/bar
If instead of the folder plz-out/gen/bar
, it's
preferable to have the output as
plz-out/gen/bar/bar_mocks.go
, you may add
**
to the output directory:
genrule(
...
output_dirs = ["_out/**"],
)
This can be useful when these files are going to end up as
$SRCS
in another rule. Some tools don't deal
well with directories e.g. javac
in
java_library()
Dependencies
Dependencies are other things you need in order to build - e.g. other code that your rule depends on.
These are also linked into the build directory, so the difference between sources and dependencies can seem arbitrary, but for some internal functions it's an important distinction to know how rules see the things they'll consume.
For example, the go compiler expects you to explicitly pass your sources
(e.g. go tool compile foo.go bar.go
), however it
discovers its dependencies through configuration (e.g. by looking at the
GOPATH).
As such, dependencies don't have any corresponding environment variables associated with them.
Tools
Tools are the things that a rule uses to build with. They can refer either to other rules within the repo, or system-level binaries, for example:
tools = [
"curl",
"//path/to:tool",
],
In this example, curl
is resolved on the system
using the PATH
variable defined in your
.plzconfig file.
//path/to:tool
will be built first and used from
its output location.
NB: For gentest()
, there's also
test_tools
which behaves exactly the same, except
is made available to test_cmd
instead.
Tools are not copied into the build directory, you can access them using the
$TOOL
or
$TOOLS
environment variables. They can also be
named in a similar manner to sources.
Note that since tools aren't copied, they can be a source of nondeterminism if you make use of other outputs that happen to be located near them. In order to ensure correctness you should make sure that you only run the tool itself and don't access neighbouring outputs.
The distinction between tools and other sources or dependencies is very important when using Please to cross-compile to a different target architecture. Tools will always be used from the host architecture (since they must be executed locally) whereas other dependencies will be for the target.
Commands
The command that you run is of course the core part of the rule. It can be
passed to genrule
in three formats:
- As a string; the simplest format, since there's only one command that's run.
-
As a list, it's a sequence of commands run one after another if the
preceding ones are successful (i.e. it's effectively shorthand for
' && '.join(cmd)
-
As a dict, the keys correspond to the build config to run and the values
are the command to run in that config. The typical use case here is
opt
vs.dbg
but arbitrary names can be given and specified withplz build -c name
.
There are various special sequence replacements that the rule is subject to:
-
$(location //path/to:target)
expands to the location of the given build rule, which must have a single output only. -
$(locations //path/to:target)
expands to the locations of the outputs of the given build rule, which can have any number of outputs. -
$(dir //path/to:target)
expands to the directory containing the outputs of the given label -
$(out_dir //path/to:target)
expands to the directory containing the outputs of the given label with the preceding plz-out/{gen|bin} -
$(exe //path/to:target)
expands to a command to run the output of the given target. The rule must be marked as binary. -
$(out_exe //path/to:target)
expands to a command to run the output of the given target with the precdeing plz-out/{gen|bin}. The rule must be marked as binary. -
$(out_location //path_to:target)
expands to the output of the given build rule, with the preceding plz-out/{gen|bin}
Consider carefully when to use this though; it is not normally useful. -
$(out_locations //path_to:target)
expands to the locations of the outputs of the given build rule, with the preceding plz-out/{gen|bin}. The rule can have any number of outputs. -
$(hash //path/to:target)
expands to a hash of the outputs of that target.
This can be useful to uniquely fingerprint the given rule.
Build environment
Please executes the build command in isolation. Typically, this means that rules cannot access files and environment variables that have not explcitly been made available to that rule.
The following environment variables are set by please before executing your command:
-
ARCH
: architecture of the system, eg. amd64 -
OS
: current operating system (linux, darwin, etc). -
PATH
: usual PATH environment variable as defined in your .plzconfig -
TMP_DIR
: the temporary directory you're compiling within. -
HOME
: also set to the temporary directory you're compiling within. -
NAME
: the name of the rule. -
SRCS
: the sources of your rule -
OUTS
: the outputs of your rule -
PKG
: the path to the package containing this rule -
PKG_DIR
: Similar toPKG
but always contains a path (specifically.
if the rule is in the root of the repo). -
NAME
: the name of this build rule -
OUT
: the output of this rule. Only present when there is only one output. -
SRC
: the source of this rule. Only present when there is only one source. -
SRCS_<suffix>
: Present when you've defined named sources on a rule. Each group creates one of these these variables with paths to those sources. -
TOOLS
: Any tools defined on the rule. -
TOOL
: Available on any rule that defines a single tool only. -
SECRETS
: If any secrets are defined on the rule, these are the paths to them.
Passing environment variables
By default, no environment variables from the host machine are available
to rules. To make a variable available to the rule, set the
pass_env
parameter on
genrule()
and
gentest()
.
These variables will be made available to both
cmd
and test_cmd
. They
will also contribute to the rule hash, so if they change, the rule will be
re-built and tested as appropriate.
NB: It's also possible to set environment variables to be passed globally
to all rules in .plzconfig
. See the
build section in the
config documentation for more information.
Secrets
Rules can define a list of secrets that they want access to. These are all
absolute paths (beginning with /
or
~
and aren't copied to the build directory;
instead they can be located using the environment variable
$SECRETS
.
They're useful for things like signing or access keys that you don't want
to check into version control but still might be necessary for building
some rules.
These don't contribute to the key used to retrieve outputs from the cache; this means it's possible for one machine to build a target with the secret and then share the output with others.
Entry points
Some tools require resources to run. For example SDKs commonly have their compiler (and/or runtime) in a directory next to the SDK libraries. It can be hard to write a rule to execute this binary in this context. This is because binary rules have a single output whereas here we require the whole folder.
For example the java JDK has the following structure:
.../jdk
|-- bin
| |-- java
| |-- javac
| |-- ...
|-- lib
| |-- ...
...
Using entry points, javac and java can be defined as such:
genrule(
name = "jdk",
cmd = "unzip jdk.zip -o $OUT"
outs = ["out"],
entry_points = {
"java": "jdk/bin/java",
"javac": "jdk/bin/javac",
},
)
These entry points can then be used as tools by other rules, follwing a similar syntax to named outputs:
genrule(
name = "java_lib",
cmd = "$TOOLS_JAVAC ...",
tools = {
"javac": ":jdk|javac",
},
)
Tests
As well as genrule()
, there's also
gentest()
which defines tests. Test rules are very similar to other build rules in
that they have a build step. This build step usually produces a test binary
which is then executed in the test step as defined by
test_cmd
:
gentest(
name = "some_test",
...
cmd = "$TOOLS $SRCS -o $OUT", # compiles a binary that we can later run
tools = [CONFIG.SOME_COMPILER], # Some sort of compiler e.g. gcc
outs = ["test"],
test_cmd = "$TEST > $RESULTS_FILE", # Execute the test. $TEST is set to the output of cmd
)
In this example, this rule will generate test
, a
binary containing our compiled tests. We then execute this in test_cmd. The
test output is piped to test.results
as defined by
the $RESULTS_FILE
variable.
For more information on testing in general, see Testing with Please.