Set up a Sufficiently Powerful Build Farm

The Problem

hydra.nixos.org compiles and provides binaries only for the haskellPackages package set. The build farm compiles none of our LTS Haskell package sets, which means that users of haskell.packages.lts-x_y cannot get any pre-compiled binaries. It also means that those builds aren’t verified, i.e. we won’t notice when changes to Nixpkgs break builds in those package sets.

Furthermore, we have no pre-compiled binaries with profiling support for any of our package sets.

The Situation Today

We have 66 active package sets that define the following number of active builds per platform:

        pkgset     builds
 1:    ghc6123       5173
 2:     ghc704       5182
 3:    ghc7102       5189
 4:     ghc722       5183
 5:     ghc742       5183
 6:     ghc763       5182
 7:     ghc783       5173
 8:     ghc784       5173
 9:    ghcHEAD       5188
10: ghcNokinds       5188
11:      ghcjs       5172
12:    lts-0_0        795
13:    lts-0_1        795
14:    lts-0_2        795
15:    lts-0_3        795
16:    lts-0_4        795
17:    lts-0_5        795
18:    lts-0_6        795
19:    lts-0_7        795
20:    lts-1_0        827
21:    lts-1_1        827
22:   lts-1_10        828
23:   lts-1_11        829
24:   lts-1_12        829
25:   lts-1_13        829
26:   lts-1_14        830
27:   lts-1_15        831
28:    lts-1_2        828
29:    lts-1_4        828
30:    lts-1_5        828
31:    lts-1_7        828
32:    lts-1_8        828
33:    lts-1_9        827
34:    lts-2_0       1019
35:    lts-2_1       1019
36:   lts-2_10       1023
37:   lts-2_11       1023
38:   lts-2_12       1023
39:   lts-2_13       1022
40:   lts-2_14       1022
41:   lts-2_15       1022
42:   lts-2_16       1022
43:   lts-2_17       1023
44:   lts-2_18       1022
45:   lts-2_19       1022
46:    lts-2_2       1018
47:   lts-2_20       1024
48:   lts-2_21       1023
49:   lts-2_22       1023
50:    lts-2_3       1018
51:    lts-2_4       1018
52:    lts-2_5       1018
53:    lts-2_6       1017
54:    lts-2_7       1017
55:    lts-2_8       1023
56:    lts-2_9       1023
57:    lts-3_0       1322
58:    lts-3_1       1322
59:    lts-3_2       1321
60:    lts-3_3       1321
61:    lts-3_4       1321
62:    lts-3_5       1322
63:    lts-3_6       1321
64:    lts-3_7       1323
65:    lts-3_8       1323
66:    lts-3_9       1324
        pkgset     builds

That gives a total of 111,647 active builds, many of which are identical. All package sets combined define 77,445 distinct store paths, i.e. some 34,202 builds are shared across package sets.

Now, hydra.nixos.org compiles only haskellPackages at the moment. Out of a total of 46,862 builds in trunk, 15,446 (33%) come from the Haskell package set. If we’d enable every Haskell package set on Linux/i686, Linux/x86_64, and Darwin/x86_64, then we’d have a total of 263,751 builds — 5.6 times as much as before —, and 88% of all builds would be related to Haskell.

A complete build of the active derivations in haskellPackages takes up approx. 27 GByte of disk space per platform. That gives about 80 GByte for all of our 3 active platforms. How would that number develop if we’d enable everything? The store path sizes in MByte are distributed as follows (based on 7,620 samples excluding ghc):

Minimum  1st Quart.  Median     Mean   3rd Quart.  Maximum
0.0169   0.3557      0.9497   4.6300   3.0640      678.9000

Multiplying the average store path size by the number of distinct store paths tells us that storing everything requires approx. 360 GByte per platform. With 3 active platforms, we’d need about 1 TByte of disk space for one complete set of Haskell packages.

Now, we might be able to reduce that number by disabling some particularly large builds. The store path size distribution is skewed to the left, i.e. towards smaller builds. Approximately 82% of all store paths are actually smaller than the numerical average. Our top-20 biggest Haskell builds are:

                    pkg  size
 1:                 ghc 895.5
 2:            metadata 678.9
 3:           uhc-light 253.5
 4:           OpenGLRaw 249.3
 5:             FpMLv53 229.5
 6:        amazonka-ec2 214.8
 7:                Agda 212.6
 8:                 xhb 189.0
 9:  unicode-properties 175.0
10:    scholdoc-texmath 165.0
11:               idris 137.7
12:                  gf 126.4
13:              pandoc 124.8
14:       unicode-names 118.4
15:              wxcore 113.8
16:      java-character 112.1
17:                 hat 111.1
18:             texmath 109.6
19:      open-symbology 107.6
20: turkish-deasciifier 104.0

If we’d make an effort to disable some of those expensive builds — or maybe reduce their output size —, then we’d make a noticeable dent into the space requirements. Even so, it’s clear that hydra.nixos.org cannot provide that much disk space today.

Curiously enough, the CPU power necessary to compile all those packages is the least of our problems. Our build farm can easily re-compile everything from scratch within 2-3 days, which is “good enough” for all practical purposes. Also, changes to stdenv occur rarely (and we typically know about them in advance). The normal update cycle triggers only a handful of builds – maybe 20-300 per day –, because the versions fundamental Haskell packages are fixed in the LTS package sets.

It’s unclear whether the Hydra software would cope with 66 package sets with some 111,000 derivations in them that need to be evaluated, say, once an hour. Hydra has undergone some architectural changes recently that might make such a load possible — i.e. hydra-evaluator is more efficient than it used to be —, but I don’t have any reliable data concerning the performance of the process, so I cannot say what is possible and what is not.

We know for sure that the currently available disk space doesn’t suffice. Disk space is notoriously low on hydra.nixos.org, and storing another terabyte Haskell data is certainly impossible at the moment.

Possible Improvements

We have basically two alternatives:

  1. Throw hardware (money) at hydra.nixos.org.
  2. Establish a separate build farm for Haskell packages.

Either solution requires money, which we could probably raise through crowd funding. At the moment, the NixOS Foundation collects donations for purposes of NixOS in general, but it should be possible to start a funding campaign that collects donations specifically for the purposes of establishing a Haskell build farm so that people who care about that particular topic have an incentive to participate.

Now, if we’d go for approach (1), then we could use those funds to buy bigger disks and more RAM for hydra.nixos.org, which would be beneficial for everyone – not just Haskell users. The downside is that hydra.nixos.org is a bit of a black box. Only very few people have access to those machines, and that situation is not going to change any time soon. Personally, I have no idea whether adding RAM or disks to the cluster is feasible at all, and whether those upgrades would enable the build farm to cope with the number of builds that we’re considering here.

Solution (2) seems more manageable, because we could set up an environment from scratch as we see fit. Experience from managing hydra.cryp.to suggests that one powerful KVM-based virtual server can serve as the Hydra master. In addition, we’d need 2-3 additional build slaves to compile packages. For massive re-builds, we could spawn another 10-15 builds slaves in EC2 to reduce the time it takes to re-build everything from scratch. Such a setup would probably work well in practice, and it should be available at a yearly cost of 1,000 dollars or less.

Anyhow, that’s just a rough estimate. I don’t know, really, what an ideal hardware / service platform for running such a virtual service would be. It would be great if a resident virtual server / NAS / system management guru could chime in with suggestions; I’m sure the NixOS crowd has people who know that kind of stuff and who can design the infrastructure for such a build farm.