Advanced Please

So after reading the rest of it, you're wondering what else it can do?

This section covers the more esoteric bits of Please functionality. If you're familiar with Blaze or Buck, these probably still won't be familiar :)

Pre- and post-build actions

It's possible in Please to define callbacks in Python that are invoked either immediately before or immediately after the target is built. These allow modifying aspects of the target or the wider build graph which can potentially be a powerful tool to handle things that are otherwise awkward or impossible.

Note that these functions are only evaluated at build time, so their results will not be visible to plz query and they can be a little hard to debug if you get things wrong. They should hence be used judiciously.

Pre-build function

The pre-build function is defined on a build_rule as simply pre_build = lambda name: do_stuff(name).

As you can see, it's invoked with one argument, the name of the rule about to be built. It's up to you what you do in the function, although in practice the only really useful thing at present is to inspect the rule's transitive labels and adjust its build command accordingly. This is done via calling get_labels(name, prefix) where name is the name of the current rule and prefix is a prefix the labels you're interested in will have (for convenience, it's stripped from the returned values). Having got these, one can then call set_command(name, cmd) to alter the command for your rule.

The built-in C++ rules for Please (src/parse/rules/cc_rules.build_defs) are a reasonably good example of how to use this.

Post-build function

The post-build function is somewhat more powerful and useful than the pre-build function, and hence it appeared in the API several months sooner. It is defined similarly (post_build = lambda name, output: do_stuff(output)) but takes an extra output which is the standard output of the build rule after successful invocation.

The power of this is that you can run a build rule to do arbitrary operations and then alter the build graph based on that; currently you can add outputs to the rule, define new rules and add dependencies to a rule. These lead to two motivating examples which both occur in built-in rules:

First, collecting additional output files. This happens with Java protobuf outputs where the output files are not obvious until the .proto file is parsed, because their location is defined by the option java_package stanza. The first iteration of proto_library required these to be explicitly defined but that rapidly became tedious; the nicer solution is detecting it this way. The build rule simply invokes find to locate all the .java files it produces and the post-build function receives those and runs add_out(name, java_file) for each.

Second, we wanted to be able to fetch third-party dependencies without having to expand a transitive dependency tree into separate maven_jar rules by hand. maven_jars does this by hitting up Maven to get the dependencies then generating a new build rule for each.

Require / Provide

This is a generalised mechanism for rules to provide specialised dependencies to other rules. The most obvious example is again proto_library, also illustrating how it's a harder rule to write well than you'd think.

The problem with proto_library is that we ideally want to maintain the illusion that there's one rule by the name given in the BUILD file and have all other rules depend only on that, but of course it generates outputs for multiple languages and it's pretty suboptimal to have your python_library have to wait for the C++ and Java generated code to compile.

This is solved by the proto_library rule declaring the following property:


	provides = {
	    'py': ':python_dependency',
	    'java': ':java_dependency',
            'cc': ':cc_dependency',
        }
      
And then the python_library rule is annotated with

	  requires = ['py']
      
Individually neither has any effect, but when a rule with a particular require depends on another with a matching provide, its dependency is matched directly to the one specified by the providing rule. This means that it can skip other dependencies that it doesn't care about and be matched only to the ones it does.

The consequence of this - skipping the actual declared dependency - can be a bit nonobvious so again should be invoked sparingly. Typically the rule declaring provides should be a no-op collecting rule such as filegroup and the various provided rules shouldn't normally be needed together. Different languages are the most common example here (and indeed this feature was nearly named language at one point).

The requires stanza is also responsible for colouring the interactive build output. Currently the available colours are hardcoded but one of these days they might become configurable.

Hash verification

Please can natively verify hashes of packages. Some of the built-in rules for fetching things from third-party repos have this option, and you can add it to your own genrules. For example, one of the Python libraries we use:


        pip_library(
            name = 'six',
            version = '1.9.0',
            outs = ['six.py'],
            hashes = ['sha1: 0c31ab7cf1a2761efa32d9a7e891ddeadc0d8673'],
        )
      
This declares that the calculated sha1 hash of the package must match one of the given set, and it's a failure if not.

You can find the output hash of a particular target by running plz hash //third_party/python:six which will calculate it for you, and you can enter it in the BUILD file. At some point in the future we'd like to provide better tooling around this (eg. automatically updating the build file with a particular hash).

The reason for allowing multiple hashes is for rules that generate different outputs on different architectures; this is common for Python libraries which have a compiled component, for example.

For testing purposes you can run Please with the --nohash_verification flag which will reduce hash verification failures to a warning message only.

Note that when using this you must be careful that the outputs of your rule are really deterministic. This is generally true for remote_file calls, but obviously only if the server returns the same thing every time for that URL. Some care should be taken with pip_library since the outputs of a 'pip install' for a package containing binary (not pure Python) modules are not bit-for-bit identical if compiled locally, only if you downloaded a precompiled wheel. Different Python and OS versions can affect it too.

Licence validation

Please can attempt to autodetect licences from third-party packages and inform you if they're not ones you'd accept. You mark licences in the .plzconfig file like so:


        [licences]
        accept = MIT
        accept = BSD
        reject = MS-EULA
      
By default, with no [licences] section, Please won't perform any licence checking. Once you've added some any package with a licence must have a matching accepted licence in the config.

Currently we can autodetect licences from pip_library and maven_jars rules, you can also set them manually via the licences attribute on a rule.

It bears mentioning that this is done as a best-effort - since licences and their locations are not standardised in pip (and many other places) we can't always be fully confident about this, so it can't be a full substitute for validating licences yourself.