Set up a Sufficiently Powerful Build Farm ========================================= The Problem ----------- `hydra.nixos.org` compiles and provides binaries only for the `haskellPackages` package set. The build farm compiles none of our LTS Haskell package sets, which means that users of `haskell.packages.lts-x_y` cannot get any pre-compiled binaries. It also means that those builds aren't verified, i.e. we won't notice when changes to Nixpkgs break builds in those package sets. Furthermore, we have [no pre-compiled binaries with profiling support][1] for any of our package sets. The Situation Today ------------------- We have 66 active package sets that define the following number of active builds per platform: pkgset builds 1: ghc6123 5173 2: ghc704 5182 3: ghc7102 5189 4: ghc722 5183 5: ghc742 5183 6: ghc763 5182 7: ghc783 5173 8: ghc784 5173 9: ghcHEAD 5188 10: ghcNokinds 5188 11: ghcjs 5172 12: lts-0_0 795 13: lts-0_1 795 14: lts-0_2 795 15: lts-0_3 795 16: lts-0_4 795 17: lts-0_5 795 18: lts-0_6 795 19: lts-0_7 795 20: lts-1_0 827 21: lts-1_1 827 22: lts-1_10 828 23: lts-1_11 829 24: lts-1_12 829 25: lts-1_13 829 26: lts-1_14 830 27: lts-1_15 831 28: lts-1_2 828 29: lts-1_4 828 30: lts-1_5 828 31: lts-1_7 828 32: lts-1_8 828 33: lts-1_9 827 34: lts-2_0 1019 35: lts-2_1 1019 36: lts-2_10 1023 37: lts-2_11 1023 38: lts-2_12 1023 39: lts-2_13 1022 40: lts-2_14 1022 41: lts-2_15 1022 42: lts-2_16 1022 43: lts-2_17 1023 44: lts-2_18 1022 45: lts-2_19 1022 46: lts-2_2 1018 47: lts-2_20 1024 48: lts-2_21 1023 49: lts-2_22 1023 50: lts-2_3 1018 51: lts-2_4 1018 52: lts-2_5 1018 53: lts-2_6 1017 54: lts-2_7 1017 55: lts-2_8 1023 56: lts-2_9 1023 57: lts-3_0 1322 58: lts-3_1 1322 59: lts-3_2 1321 60: lts-3_3 1321 61: lts-3_4 1321 62: lts-3_5 1322 63: lts-3_6 1321 64: lts-3_7 1323 65: lts-3_8 1323 66: lts-3_9 1324 pkgset builds That gives a total of 111,647 active builds, many of which are identical. All package sets combined define 77,445 distinct store paths, i.e. some 34,202 builds are shared across package sets. Now, `hydra.nixos.org` compiles only `haskellPackages` at the moment. Out of a total of [46,862 builds in trunk][2], 15,446 (33%) come from the Haskell package set. If we'd enable every Haskell package set on Linux/i686, Linux/x86_64, and Darwin/x86_64, then we'd have a total of 263,751 builds --- 5.6 times as much as before ---, and 88% of all builds would be related to Haskell. A complete build of the active derivations in `haskellPackages` takes up approx. 27 GByte of disk space per platform. That gives about 80 GByte for all of our 3 active platforms. How would that number develop if we'd enable everything? The store path sizes in MByte are distributed as follows (based on 7,620 samples excluding `ghc`): Minimum 1st Quart. Median Mean 3rd Quart. Maximum 0.0169 0.3557 0.9497 4.6300 3.0640 678.9000 Multiplying the average store path size by the number of distinct store paths tells us that storing *everything* requires approx. 360 GByte per platform. With 3 active platforms, we'd need about 1 TByte of disk space for one complete set of Haskell packages. Now, we might be able to reduce that number by disabling some particularly large builds. The store path size distribution is skewed to the left, i.e. towards smaller builds. Approximately 82% of all store paths are actually smaller than the numerical average. Our top-20 biggest Haskell builds are: pkg size 1: ghc 895.5 2: metadata 678.9 3: uhc-light 253.5 4: OpenGLRaw 249.3 5: FpMLv53 229.5 6: amazonka-ec2 214.8 7: Agda 212.6 8: xhb 189.0 9: unicode-properties 175.0 10: scholdoc-texmath 165.0 11: idris 137.7 12: gf 126.4 13: pandoc 124.8 14: unicode-names 118.4 15: wxcore 113.8 16: java-character 112.1 17: hat 111.1 18: texmath 109.6 19: open-symbology 107.6 20: turkish-deasciifier 104.0 If we'd make an effort to disable some of those expensive builds --- or maybe reduce their output size ---, then we'd make a noticeable dent into the space requirements. Even so, it's clear that `hydra.nixos.org` cannot provide that much disk space today. Curiously enough, the CPU power necessary to compile all those packages is the least of our problems. Our build farm can easily re-compile everything from scratch within 2-3 days, which is "good enough" for all practical purposes. Also, changes to `stdenv` occur rarely (and we typically know about them in advance). The normal update cycle triggers only a handful of builds -- maybe 20-300 per day --, because the versions fundamental Haskell packages are fixed in the LTS package sets. It's unclear whether the Hydra software would cope with 66 package sets with some 111,000 derivations in them that need to be evaluated, say, once an hour. Hydra has undergone some architectural changes recently that might make such a load possible --- i.e. `hydra-evaluator` is more efficient than it used to be ---, but I don't have any reliable data concerning the performance of the process, so I cannot say what is possible and what is not. We know for sure that the currently available disk space doesn't suffice. Disk space is notoriously low on `hydra.nixos.org`, and storing another terabyte Haskell data is certainly impossible at the moment. Possible Improvements --------------------- We have basically two alternatives: 1. Throw hardware (money) at `hydra.nixos.org`. 2. Establish a separate build farm for Haskell packages. Either solution requires money, which we could probably raise through crowd funding. At the moment, the NixOS Foundation collects donations for purposes of NixOS in general, but it should be possible to start a funding campaign that collects donations specifically for the purposes of establishing a Haskell build farm so that people who care about that particular topic have an incentive to participate. Now, if we'd go for approach (1), then we could use those funds to buy bigger disks and more RAM for `hydra.nixos.org`, which would be beneficial for everyone -- not just Haskell users. The downside is that `hydra.nixos.org` is a bit of a black box. Only very few people have access to those machines, and that situation is not going to change any time soon. Personally, I have no idea whether adding RAM or disks to the cluster is feasible at all, and whether those upgrades would enable the build farm to cope with the number of builds that we're considering here. Solution (2) seems more manageable, because we could set up an environment from scratch as we see fit. Experience from managing `hydra.cryp.to` suggests that one powerful KVM-based virtual server can serve as the Hydra master. In addition, we'd need 2-3 additional build slaves to compile packages. For massive re-builds, we could spawn another 10-15 builds slaves in EC2 to reduce the time it takes to re-build everything from scratch. Such a setup would probably work well in practice, and it should be available at a yearly cost of 1,000 dollars or less. Anyhow, that's just a rough estimate. I don't know, really, what an ideal hardware / service platform for running such a virtual service would be. It would be great if a resident virtual server / NAS / system management guru could chime in with suggestions; I'm sure the NixOS crowd has people who know that kind of stuff and who can design the infrastructure for such a build farm. [1]: https://github.com/NixOS/nixpkgs/issues/10143 [2]: http://hydra.nixos.org/jobset/nixpkgs/trunk