Please cache

In various places in these docs you might find reference to caches. Please can make use of several caches to speed up its performance which are described here.

In all cases artifacts are only stored in the cache after a successful build or test run.
Please takes a --nocache flag which disables all caches for an individual run.

The directory cache

This is the simplest kind of cache; it's on by default and simply is a directory tree (by default .plz-cache at the root of your repo) containing various versions of built artifacts. The main advantage of this is that it allows extremely fast rebuilds when swapping between different versions of code (notably git branches).

Please comes with a binary creatively named cache_cleaner which it fires off at startup to keep the size of the cache down. By nature of not having a full-time daemon monitoring it it's possible that the size slightly exceeds the config bounds, but generally it does as good a job as it can.

Note that the dir cache is not threadsafe or locked in any way beyond plz's normal repo lock, so sharing the same directory between multiple projects is probably a Bad Idea.

The HTTP cache

This is a more advanced cache which, as one would expect, can run on a centralised machine to share artifacts between multiple clients. It has a reasonably simple API, for reference:

  • GET /artifact/{os_name}/{artifact}: Retrieves a particular artifact.
  • POST /artifact/{os_name}/{artifact}: Stores a particular artifact.
  • DELETE /artifact/{artifact}: Deletes all versions of a given artifact.
  • DELETE /: Deletes all artifacts.
We should document this in more detail, especially since the formats can be subtle (an awkward corner case requires multipart for some cases) but as described below it is probably preferable to implement the RPC cache instead.

The cache runs as a daemon and is fully threadsafe so is of course safe for multiple clients to attempt to store / retrieve artifacts simultaneously. Because it's a daemon it maintains its own stats about what artifacts it has so can be a little more intelligent than the dir cache about what it should delete and when.

Please comes with an implementation of this cache as a standalone binary.

Thanks to Diana Costea who implemented the original version of this as part of her internship with us, and prodded us into getting on and actually deploying it for our CI servers.

The RPC cache

This is very similar conceptually to the HTTP cache, but uses gRPC for communication (of course that uses HTTP itself underneath). The motivation was partly that we realised fairly late that our cache semantics sometimes require multiple files to be communicated in a single request, which implied multipart encodings which are of course not great fun. We also felt that we could get better performance this way too. An awful lot of the internal code is shared with the HTTP server so only the transport layer really differs.

The API can be found in src/cache/proto/rpc_cache.proto. It's not very complex so would not be hard to implement, although again Please comes with an implementation of this cache as a standalone binary.

Notes

Our current CI setup leans very heavily on these caches; every checkin to master triggers a build of our repo in a clean environment, so initially nothing is present in plz-out. The build machines maintain a local directory cache though which ensures things are pretty fast (the penalty for a cache hit is obviously a bit worse than having the artifact in plz-out already, but it's pretty quick).

The build machines all push artifacts into a single central RPC cache and pull them back again as needed, so generally after an initial test run of a PR subsequent builds are fast. Developer machines use the RPC cache in read-only mode so can partake in these too; it would be nice if everyone was read-write but generally the shared cache is vulnerable to developers having incompatible machine setups (for example, different Go versions, even minor ones, cannot share compiled artifacts so many bad things will happen if developers aren't all on exactly the same one). A very long-term goal is for Please to have better insight into the machine-level deps of these things which would potentially allow everyone to have read-write access.

Theoretically Please ensures hashes are as expected before storing or retrieving artifacts, but as just noted there are ways to cause problems nonetheless. Hence the various caches store artifacts in a pretty obvious filesystem structure analogous to the repo structure itself, so if any single artifact is wrong it's not hard to excise it.