Writing build rules

Please is designed to support interleaving custom build commands as first-class citizens in the system. This describes some of the concepts to bear in mind when writing rules of your own.

The documentation for genrule() and gentest() may be of use as well, as that explains the available arguments and what they do.

Word count example

Here is an example of a build rule. It takes in example/file.txt from the source tree and counts the number of words within:

    
    
    # //example:word_count
    genrule(
      name = "word_count",
      srcs = ["file.txt"],
      outs = ["file.wordcount"],
      cmd = "wc $SRCS > $OUT",
    )
    
  

Please sets up $SRCS and $OUT based on the parameters we passed to genrule(), then executes our command. When this rule builds, it will output plz-out/gen/example/file.wordcount. For more information on how this works, see the sources and outputs sections below.

Build location

A core concept of Please is to try to isolate execution so we have a good concept of correctness for each action. To this end each rule builds in its own temporary directory (under plz-out/tmp) containing only files defined as inputs to the rule.

This means that you can't access any other files in your source tree from within the rule. If you want to have a look at what it looks like in there, plz build --shell will prepare a target for building and open up a shell into the directory, where you can look around and try running commands by hand.

Some aspects of this environment are different to normal - for example rules are given a small, deterministic set of environment variables. They shouldn't need to read anything outside this directory or not described by those variables. See the Passing environment variables section of this page for more information on passing variables from the host machine to the rule.

Sources

Sources are the inputs to your rule - typically source code, but like any other inputs, they can also be build labels. This is defined in the srcs argument.

All sources are linked into the build directory - for performance reasons they aren't copied so you shouldn't modify them in-place.

Sources can be both files and directories, however it can be useful to use globs for sources e.g. to include all files with a given extension. See the built ins for more information on this.

Sources can be referred through the $SRCS environment variable. If there's only one, $SRC will be defined too.
Sources can also be named, e.g.

    
    
    srcs = {
        "srcs": ["my_source.py"],
        "resources": ["something.txt"],
    },
    
  

In that case they can be accessed separately through $SRCS_SRCS and $SRCS_RESOURCES.

Outputs

Rules must define their outputs explicitly. Only these files will be saved to plz-out and made available to other rules.

The output location within plz-out depends on whether the rule is marked as binary. Binary rules output to plz-out/bin/... and non-binary rules output to plz-out/gen/....

Like sources, outputs can be both files and directories. The simplest way to define outputs on a rule is as a list:

    
    
    outs = [
      "path/to/a/directory",
      "foo.a",
    ],
    
  

The $OUT and $OUTS variables will be set much like sources.

Named outputs

Much like sources, outs can also be a dictionary. In this case, the outputs are considered named outputs:

      
      
    genrule(
        name = "my_rule",
        ...
        outs = {
            "src": ["src"],
            "lib": ["lib"],
        },
    )
      
    

The variables $OUTS_SRC and $OUTS_LIB will be set. These outputs can then be depended on individually from other rules:

      
      
    genrule(
        ...
        srcs = [":my_rule|srcs"],
        deps = ["my_rule|libs"],
        ...
    )
      
    

Output directories

Sometimes it can be hard to reason about the exact outputs of a rule before actually building it. Output directories allow you to specify a folder; anything that ends up in that folder will become an output of the rule. The folder's path will not constitute part of the output path of the file.

Imagine we're generating code. The code generation tool parses some sources and based on the symbols it finds, produces a set of outputs e.g. mock generation:

      
      
    genrule(
        name = "mocks",
        srcs = ["interfaces.go"],
        tools = {
            "mockgen": ["mockgen"],
        },
        cmd = "$TOOLS_MOCKGEN ... $SRCS -o $OUT"
        outs = ???,
    )
      
    

It's difficult to know ahead of time which files mockgen is going to generate. It's dependent on the source code within interfaces.go. We can use output directories for this:

      
      
    genrule(
        name = "mocks",
        srcs = ["interfaces.go"],
        tools = {
            "mockgen": ["mockgen"],
        },
        cmd = "mkdir _out && $TOOLS_MOCKGEN ... $SRCS -o _out"
        output_dirs = ["_out"],
    )
     
    

If mockgen generates foo_mocks.go and bar/bar_mocks.go, these will become outputs of the rules as if they were defines as part of the outs list:

      
      
    Build finished; total time 230ms, incrementality 100.0%. Outputs:
    //:mocks:
      plz-out/gen/foo_mocks.go
      plz-out/gen/bar
      
    

If instead of the folder plz-out/gen/bar, it's preferable to have the output as plz-out/gen/bar/bar_mocks.go, you may add ** to the output directory:

      
      
    genrule(
        ...
        output_dirs = ["_out/**"],
    )
      
    

This can be useful when these files are going to end up as $SRCS in another rule. Some tools don't deal well with directories e.g. javac in java_library()

Dependencies

Dependencies are other things you need in order to build - e.g. other code that your rule depends on.

These are also linked into the build directory, so the difference between sources and dependencies can seem arbitrary, but for some internal functions it's an important distinction to know how rules see the things they'll consume.

For example, the go compiler expects you to explicitly pass your sources (e.g. go tool compile foo.go bar.go), however it discovers its dependencies through configuration (e.g. by looking at the GOPATH).

As such, dependencies don't have any corresponding environment variables associated with them.

Tools

Tools are the things that a rule uses to build with. They can refer either to other rules within the repo, or system-level binaries, for example:

    
    
    tools = [
        "curl",
        "//path/to:tool",
    ],
    
  

In this example, curl is resolved on the system using the PATH variable defined in your .plzconfig file. //path/to:tool will be built first and used from its output location.

NB: For gentest(), there's also test_tools which behaves exactly the same, except is made available to test_cmd instead.

Tools are not copied into the build directory, you can access them using the $TOOL or $TOOLS environment variables. They can also be named in a similar manner to sources.

Note that since tools aren't copied, they can be a source of nondeterminism if you make use of other outputs that happen to be located near them. In order to ensure correctness you should make sure that you only run the tool itself and don't access neighbouring outputs.

The distinction between tools and other sources or dependencies is very important when using Please to cross-compile to a different target architecture. Tools will always be used from the host architecture (since they must be executed locally) whereas other dependencies will be for the target.

Commands

The command that you run is of course the core part of the rule. It can be passed to genrule in three formats:

  • As a string; the simplest format, since there's only one command that's run.
  • As a list, it's a sequence of commands run one after another if the preceding ones are successful (i.e. it's effectively shorthand for ' && '.join(cmd)
  • As a dict, the keys correspond to the build config to run and the values are the command to run in that config. The typical use case here is opt vs. dbg but arbitrary names can be given and specified with plz build -c name.

There are various special sequence replacements that the rule is subject to:

  • $(location //path/to:target) expands to the location of the given build rule, which must have a single output only.
  • $(locations //path/to:target) expands to the locations of the outputs of the given build rule, which can have any number of outputs.
  • $(dir //path/to:target) expands to the directory containing the outputs of the given label
  • $(out_dir //path/to:target) expands to the directory containing the outputs of the given label with the preceding plz-out/{gen|bin}
  • $(exe //path/to:target) expands to a command to run the output of the given target. The rule must be marked as binary.
  • $(out_exe //path/to:target) expands to a command to run the output of the given target with the precdeing plz-out/{gen|bin}. The rule must be marked as binary.
  • $(out_location //path_to:target) expands to the output of the given build rule, with the preceding plz-out/{gen|bin}
    Consider carefully when to use this though; it is not normally useful.
  • $(out_locations //path_to:target) expands to the locations of the outputs of the given build rule, with the preceding plz-out/{gen|bin}. The rule can have any number of outputs.
  • $(hash //path/to:target) expands to a hash of the outputs of that target.
    This can be useful to uniquely fingerprint the given rule.
  • $(worker //path/to:target) invokes the given target as a persistent worker. See the linked page for more details on how to use them.

Build environment

Please executes the build command in isolation. Typically, this means that rules cannot access files and environment variables that have not explcitly been made available to that rule.

The following environment variables are set by please before executing your command:

  • ARCH: architecture of the system, eg. amd64
  • OS: current operating system (linux, darwin, etc).
  • PATH: usual PATH environment variable as defined in your .plzconfig
  • TMP_DIR: the temporary directory you're compiling within.
  • HOME: also set to the temporary directory you're compiling within.
  • NAME: the name of the rule.
  • SRCS: the sources of your rule
  • OUTS: the outputs of your rule
  • PKG: the path to the package containing this rule
  • PKG_DIR: Similar to PKG but always contains a path (specifically . if the rule is in the root of the repo).
  • NAME: the name of this build rule
  • OUT: the output of this rule. Only present when there is only one output.
  • SRC: the source of this rule. Only present when there is only one source.
  • SRCS_<suffix>: Present when you've defined named sources on a rule. Each group creates one of these these variables with paths to those sources.
  • TOOLS: Any tools defined on the rule.
  • TOOL: Available on any rule that defines a single tool only.
  • SECRETS: If any secrets are defined on the rule, these are the paths to them.

Passing environment variables

By default, no environment variables from the host machine are available to rules. To make a variable available to the rule, set the pass_env parameter on genrule() and gentest().

These variables will be made available to both cmd and test_cmd. They will also contribute to the rule hash, so if they change, the rule will be re-built and tested as appropriate.

NB: It's also possible to set environment variables to be passed globally to all rules in .plzconfig. See the build section in the config documentation for more information.

Secrets

Rules can define a list of secrets that they want access to. These are all absolute paths (beginning with / or ~ and aren't copied to the build directory; instead they can be located using the environment variable $SECRETS.
They're useful for things like signing or access keys that you don't want to check into version control but still might be necessary for building some rules.

These don't contribute to the key used to retrieve outputs from the cache; this means it's possible for one machine to build a target with the secret and then share the output with others.

Entry points

Some tools require resources to run. For example SDKs commonly have their compiler (and/or runtime) in a directory next to the SDK libraries. It can be hard to write a rule to execute this binary in this context. This is because binary rules have a single output whereas here we require the whole folder.

For example the java JDK has the following structure:

    
    
    .../jdk
    |-- bin
    |   |-- java
    |   |-- javac
    |   |-- ...
    |-- lib
    |   |-- ...
    ...
    
  

Using entry points, javac and java can be defined as such:

    
    
    genrule(
        name = "jdk",
        cmd = "unzip jdk.zip -o $OUT"
        outs = ["out"],
        entry_points = {
          "java": "jdk/bin/java",
          "javac": "jdk/bin/javac",
        },
    )
    
  

These entry points can then be used as tools by other rules, follwing a similar syntax to named outputs:

    
    
    genrule(
        name = "java_lib",
        cmd = "$TOOLS_JAVAC ...",
        tools = {
          "javac": ":jdk|javac",
        },
    )
    
  

Tests

As well as genrule(), there's also gentest() which defines tests. Test rules are very similar to other build rules in that they have a build step. This build step usually produces a test binary which is then executed in the test step as defined by test_cmd:

    
    
    gentest(
        name = "some_test",
        ...
        cmd = "$TOOLS $SRCS -o $OUT", # compiles a binary that we can later run
        tools = [CONFIG.SOME_COMPILER], # Some sort of compiler e.g. gcc
        outs = ["test"],
        test_cmd = "$TEST > $RESULTS_FILE", # Execute the test. $TEST is set to the output of cmd
    )
    
  

In this example, this rule will generate test, a binary containing our compiled tests. We then execute this in test_cmd. The test output is piped to test.results as defined by the $RESULTS_FILE variable.

For more information on testing in general, see Testing with Please.