Testing with Please

Tests in Please are run using plz test (or plz cover if you want to extract coverage information).

The documentation for gentest() may be of use as well, as that explains the available arguments and what they do.

Specifying what to test

The most obvious way of running tests is to pass a build label to plz test, for example plz test //src/core/... to test all tests under a particular package.

Labels are also particularly useful for tests; Please accepts --include and --exclude flags which filter to a subset of what it was asked to run. These are marked onto the build rules like so:

    
    
    go_test(
        name = "my_test",
        srcs = ["my_test.go"],
        labels = ["slow", "integration"],
    )
    
  

You can then invoke Please like plz test --exclude slow to run all tests except those labelled as "slow".

There is one special label, manual which always excludes tests from autodetection, so they will only be run if identified specifically. This is often useful to disable tests from general usage if needed (for example, if a test starts failing, you can disable it while investigating).

Success and failure

Please expects two things from tests:

  • That the test exits with code 0 if successful, or nonzero if unsuccessful
  • That it writes its results into a file or directory defined by the env var RESULTS_FILE (unless the test has no_test_output = True, in which case it is evaluated only on its exit code).

All tests are considered to end in a binary state where they are either successful or unsuccessful, so tests with variable output such as load tests are not a good fit for this system.

Result files are expected to either be golang style test output:

    
    
    === RUN   Hello
    --- PASS: Hello (0.00s)
    === RUN   Salutations
    --- PASS: Salutations (0.00s)
    
  

Or the junit/ant style xml test output:

    
    
    <testsuites time="0">
      <testsuite name="hello" tests="2" package="hello.test" time="0">
          <testcase name="Hello" classname="test" time="0">
            <system-err>Hello!</system-err>
          </testcase>
          <testcase name="Salutations" classname="test" time="0">
            <system-err>Salutations!</system-err>
          </testcase>
      </testsuite>
    </testsuites>
    
  

There's no official format for this XML structure, however we try and be compatible with Maven's surefire reports.

Tests can also be marked as flaky which causes them to be automatically re-run several times until they pass. They are considered to pass if any one run passes.
This is set as follows:

    
    
    go_test(
        name = "my_test",
        srcs = ["my_test.go"],
        flaky = True,
    )
    
  

By default they are run up to three times, you can alter this on a per-test basis by passing an integer instead of a boolean.
You should try to avoid marking tests as flaky though; in most cases flakiness indicates poor design within the test which can usually be mitigated (for example by adding locking or synchronisation instead of sleeping, etc). Flaky tests are a burden on other team members since they take extra time & resources to run, and there is always a risk that they will still fail - for example, if your test fails 50% of the time, you can expect that one in sixteen runs will fail four consecutive times which will still result in an overall failure.

Hermeticity and reproducibility

An important concept of testing that Please tries to help with is that tests should be hermetic; that is they should be isolated from local state that might cause them to unexpectedly fail. This is important to avoid nasty surprises later (for example, finding out that your test only passes if some other resources happen to have been built first, or that it fails arbitrarily if run in parallel while others are running too).

In order to help ensure this, Please runs each test in isolation from others. They are each run within their own temporary directory which contains only the test itself and files it's declared that it will need.
Most of the usual attributes of a build rule that refer to files are only available at build time (e.g. srcs, tools, deps etc). Instead files that are needed at test runtime should be specified as data. These will appear within the test directory with their full repo path.

On Linux, tests can also be sandboxed, either by setting

    
    
    sandbox = True,
    
  

on the test rule in question, or by setting it globally in the config:

    
    
    [test]
    sandbox = on
    
  

This uses kernel namespaces to segregate it from other processes:

  • Networking: only a loopback interface will be available. Tests can use this to start servers and ports opened will not be visible to the rest of the system, so clashes are impossible. Tests therefore can't access external network resources.
  • Filesystem: each test gets its own in-memory filesystem mounted on /tmp, so they can't accidentally overwrite each other's files. The temporary directory is also mounted under here to prevent tests from walking up to parent directories to find files they haven't declared.
  • It will also be in a new IPC namespace so it can't use SysV IPC to communicate with other processes.

In general the networking separation is the most useful since it means tests do not have to worry about finding unused ports or handling clashes.
These features require having unprivileged user namespace support in the kernel, which should usually be the case from version 3.10 onwards (which is the vast majority of systems today).

On other platforms, we distribute the same binary alongside Please to simplify configuration, but it currently has no effect.

If you want to explore what the sandbox is doing, you can use plz tool sandbox bash to get a shell within it; you'll observe that commands like ping and curl no longer work.