The History of Nix at Bellroy

Jack Kelly 2024-01-24

Technology

Bellroy relies heavily on Nix as an important part of our developer tooling. It provides us with reproducible environments for developer shells and CI runs, as well as a build environment for our statically linked Haskell code. Our tech team works in a moderately conservative Haskell dialect, so this level of Nix dependence might seem surprising and incongruent. In this post, I will explain how and why our Nix usage evolved in the way that it did, and point out useful tricks and tools at each stage of adoption.

Phase 1: Developer Shells via `shell.nix`

Nix has an intimidating learning curve, but most of this comes from writing Nix expressions. Developers can be easily taught to use Nix-based infrastructure once it’s been set up. Our first use of Nix was writing shell.nix files for use with nix-shell. Nix uses these files to create reproducible development environments containing correct versions of tools like ruby, ghc, etc., depending on the project. This is a great way to get started because it doesn’t ask developers to radically change their workflows, and allows them to trial Nix at their own pace. There are some subtleties to be aware of when setting up these shell expressions.

For true reproducibility, you need to store a reference to the version of nixpkgs used in each project’s source control. This is called “pinning nixpkgs”. We initially did this using the niv tool, and later by using Nix flakes.
As we have several developers using macOS, we pinned nixpkgs commits from nixpkgs-*-darwin branches. We found that this improved the cache hit rate for our macOS-using developers, and reduced the amount of software they had to build locally.
For Ruby and npm projects, we found it too difficult to capture all of their dependencies as Nix expressions. Packages in private repositories and on private package registries were the biggest challenge here, as many foo2nix tools only support public package repositories. As a workaround, our shells provide the language runtime (e.g., ruby) and its packaging tool (e.g., bundler), but leave fetching language-level dependencies to that language’s tool. We have found this to be a reasonable trade-off between correctness and practicality. The access-tokens setting in modern versions of Nix might help us, if we revisit this.

Phase 2: Building Haskell Deployment Packages using `haskell.nix`

We were comfortable just using Nix for developer shells for a fairly long time, until a confluence of several constraints forced us into a more elaborate Nix setup.

Most of our Haskell code is deployed to AWS Lambda. To build binaries for this environment, we originally used the lambci/lambda Docker container to build in an environment close to what AWS provides at runtime. This ceased to be viable once we started using Apache Kafka: the Haskell client we use (hw-kafka-client) binds to librdkafka, which is not provided by the AWS runtime environment. Instead of wrangling third-party RPM repositories or Lambda Layers, we used the excellent haskell.nix framework to build statically linked, UPX-compressed deployment packages. We published example Nix code code which does this, as part of our wai-handler-hal project.

Phase 3: Private Binary Cache using Amazon S3 and GitHub Actions

Nix + haskell.nix was a reliable way to generate deployment packages for our Haskell services, but even after adding IOG’s binary cache we would often have a lot of cache misses, leading to very long build times (particularly on macOS). It was time to bite the bullet and set up our own private cache. Nix links against the AWS SDK for C++ and can use S3-compatible object stores as binary caches, so an S3 Bucket was an obvious place to store our derivations. We needed a way to populate the cache, and weren’t ready to tackle Nix-native solutions like Hydra, so we built out a caching workflow using GitHub Actions’ hosted Linux and macOS runners. Behind this simple idea are a lot of details worth getting right, so we’ve tried to capture as many of them here as we possibly can.

Setting up the Bucket

The bucket is just a normal S3 bucket. Because Nix uses the S3 API, we can block all public access and leave website hosting turned off.
It might be worth creating the bucket in the region closest to most of your developers.
It is generally the case that many derivations stop being relevant shortly after they’ve been built.
- It might be worth considering an S3 Lifecycle Configuration to migrate old derivations from the “Standard” Storage Class to “Standard — Infrequent Access” and possibly even “Glacier Instant Retrieval”. Be careful of increased retrieval charges when using these storage classes.
- S3 Intelligent Tiering might also be worth considering. Be careful of its automation charges.
- It is possible to use a Lifecycle Configuration to delete very old derivations, but this can confuse the cache of Nix clients. It might also confuse Hydra (which keeps records of which derivations it has built).
The Nix manual provides example AWS Identity and Access Management (AWS IAM) Policy Documents for read-only and read/write access to an S3 Bucket. Actually providing credentials to Nix that have these permissions can be tricky, due to constraints imposed by Nix:
- We cannot use regular credentials to assume a more restricted role, because the C++ SDK that Nix uses does not support assume_role entries in ~/.aws/config.
- The Nix daemon runs as the root user, so we need to configure credentials in root’s home directory, and cannot use interactive ways of providing credentials.
- In the AWS cloud, Nix should be able to access credentials in the normal way (e.g., EC2 Instance Profiles).
- Nix on non-cloud machines (e.g., developer laptops) is more difficult. We are basically forced into using long-lived aws_access_key_id and aws_secret_access_key pairs. This is not best practice, so we don’t want these keypairs to be able to do too much. We recommend creating entirely separate IAM Users that can only access the cache bucket, and creating a separate User for each developer or server that needs access. Automating key rotation or setting up the access-keys-rotated rule in AWS Config can help ensure that keys are rotated regularly.
- The GitHub Actions Workflow that populates the cache will assume an AWS IAM Role with permissions to read and write the cache bucket. We don’t create an IAM User for the workflow, because GitHub Actions supports OpenID Connect and provides a guide for configuring OpenID Connect between GitHub and AWS.

Setting up Keys

Nix uses public/private key pairs to know which derivations to trust: our builder will sign derivations with the private key before uploading them to S3, and clients will know to trust the corresponding public key.

We generated a cache key pair following the recommendation in the Nix manual:

$ nix-store --generate-binary-cache-key example-nix-cache-1 key.private key.public

The private key was stored as as a GitHub Actions Secret.

The public key was set in the nixConfig setting of our flakes, which means that it applies to only our repositories. This speeds up cache checking for other builds, as Nix clients will only check our bucket when it makes sense:

{
  description = "A flake";
  inputs = ...;
  outputs = ...;

  nixConfig = {
    extra-substituters = [
      "s3://example-nix-cache?profile=bellroy"
      "https://cache.iog.io"
    ];
    extra-trusted-public-keys = [
      "example-nix-cache-1:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
      "hydra.iohk.io:f/Ea+s+dFdN+3Y/G+FDgSq+a5NEWhJGzdjvKNGv0/EQ="
    ];
  };
}

Setting up Clients

For multi-user Nix installations (the default), these AWS keys need to be loaded by the user running the Nix daemon (by default, this is root). These can be set by running commands like:

sudo -H aws configure --profile bellroy set aws_access_key_id AKYOURACCESSKEY
sudo -H aws configure --profile bellroy set aws_secret_access_key YOURSECRETKEY

macOS updates tend to remove files in ~root, including AWS config files. One way to permanently provide credentials to the Nix daemon is (thanks @lrworth):

Create AWS config and credential files in /etc/nix/aws/config and /etc/nix/aws/credentials.
Edit /Library/LaunchDaemons/org.nixos.nix-daemon.plist, adding the following lines under <key>EnvironmentVariables</key>:

<key>AWS_CONFIG_FILE</key>
<string>/etc/nix/aws/config</string>
<key>AWS_SHARED_CREDENTIALS_FILE</key>
<string>/etc/nix/aws/credentials</string>

Run sudo -i sh -c 'launchctl remove org.nixos.nix-daemon && launchctl load /Library/LaunchDaemons/org.nixos.nix-daemon.plist' to restart the Nix daemon.

Setting up the Workflow

Here is a YAML description of a sample workflow, derived from the workflow that we previously used to update our cache:

name: Populate nix shell cache
on:
  schedule:
    - cron: "0 0 * * 0"
  workflow_dispatch: {}
jobs:
  populate-cache:
    strategy:
      fail-fast: false
      matrix:
        os:
          - ubuntu-latest
          - macos-latest
    runs-on: "${{ matrix.os }}"
    steps:
      - uses: "actions/checkout@93ea575cb5d8a053eaa0ac8fa3b40d7e05a33cc8"
      - uses: "aws-actions/configure-aws-credentials"
        with:
          aws-region: "${{ env.AWS_REGION }}"
          role-to-assume: "${{ secrets.AWS_OIDC_ROLE_ARN }}"
      - uses: "cachix/install-nix-action@daddc62a2e67d1decb56e028c9fa68344b9b7c2a"
        with:
          extra_nix_config: |
            post-build-hook = /etc/nix/upload-to-cache.sh
            substituters = https://cache.nixos.org/ https://cache.iog.io s3://example-nix-cache
            trusted-public-keys = cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY= hydra.iohk.io:f/Ea+s+dFdN+3Y/G+FDgSq+a5NEWhJGzdjvKNGv0/EQ= example-nix-cache-1:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
          install_url: https://releases.nixos.org/nix/nix-2.7.0/install
          nix_path: nixpkgs=channel:nixpkgs-22.11-darwin
      - name: Set up nix signing key
        run: "echo ${{ secrets.NIX_CACHE_NIX_SIGNING_KEY }} | sudo tee /etc/nix/example-nix-cache.private > /dev/null"
      - name: Set up post-build hook
        run: |
          sudo tee /etc/nix/upload-to-cache.sh <<EOF > /dev/null
          #!/bin/sh

          set -eu
          set -f # disable globbing
          export IFS=' '

          echo "Uploading paths" \$OUT_PATHS
          exec $(which nix) copy --to 's3://example-nix-cache?region=wherever&secret-key=/etc/nix/example-nix-cache.private&compression=zstd&parallel-compression=true' \$OUT_PATHS
          EOF
          sudo chmod u+x /etc/nix/upload-to-cache.sh
      - name: Restart nix-daemon
        run: |
          case $RUNNER_OS in
            Linux) sudo systemctl restart nix-daemon.service ;;
            macOS) sudo launchctl kickstart -k system/org.nixos.nix-daemon ;;
          esac
      - name: Install nix-build-uncached
        run: |
          nix-env -iE '_: import (builtins.fetchTarball {
            url = "https://github.com/Mic92/nix-build-uncached/archive/77fe5c8c4c5c7a1fa3f9baa042474b98f2456652.tar.gz";
            sha256 = "sha256:04hqiw3rhz01qqyz2x1q14aml1ifk3m97pldf4v5vhd5hg73k1zn";
          }) {}'
      - name: Build shells
        run: |
          nix-build-uncached -build-flags '-L --keep-going' -E '(import ./.).devShells.${builtins.currentSystem}'

As with the rest of this process, the basic idea is simple (run the workflow to build all derivations required by our development shells), but the devil is in the details:

A Nix post-build hook to sign and upload any derivations we build.
We used nix-build-uncached (now deprecated) to build only the derivations that we could not find in S3, preventing lots of redundant downloads. nix-build-uncached does not support flakes, so we invoked the build through a default.nix which uses flake-compat.

The deprecation notice in nix-build-uncached’s README.md suggests more modern alternatives:
- nix-fast-build has a --skip-cached flag. A comment on Nix issue #3946 says that nix-eval-jobs (which powers nix-fast-build) can be problematic when lots of import-from-derivation (IFD) is required, as in haskell.nix;
- The comments on Nix issue #3946 suggest that nix build --store $remote_store --builders auto might (eventually?) work.
If we were doing this again, we’d probably consider Determinate Systems’ Magic Nix Cache to evaluate Nix expressions more quickly, before the build begins.
We ran the workflow weekly as a trade-off between cache freshness and billable minutes, and enabled manual workflow dispatch for when we upgraded GHC versions or major packages.
We use Zstandard compression when we upload to S3, because it’s very light on CPU time and we found that XZ was very slow on large derivations.

Conclusion

You don’t have to adopt Nix all at once to get good value out of it. Simple development shells prevent a lot of headaches and tend to have good cache hit rates, which means that it’s fine to delay private caching until much later. Our initial shell.nix served us well for over a year before we started adding more sophisticated tooling, and we only did that because we were forced. Our moves to haskell.nix and GitHub-Actions-based caching were made in response to genuine needs, and we learned as we went. We did eventually move to a Hydra-based CI system, but that’s a story for another time.

Phase 1: Developer Shells via shell.nix

Phase 2: Building Haskell Deployment Packages using haskell.nix