If you don’t care about the technical background and want to use Nix-packaged CUDA applications on a non-NixOS system, scroll down to Solutions.

Dynamic linking outside Nix

Suppose that we have a CUDA application like llama.cpp outside Nix. How does it get its library dependencies such as the required CUDA libraries on Linux? ELF binaries contains dynamic section with information for the dynamic linker. It encodes, among other things, the required dynamic libraries. For instance, we can use patchelf or readelf to get the CUDA libraries that are used:

$ patchelf --print-needed llama-cli  | grep cu
libcuda.so.1
libcublas.so.12
libcudart.so.12

So the llama.cpp CLI uses the CUDA runtime (libcudart.so.12), cuBLAS (libcublas.so.12), and the CUDA driver library (libcuda.so.1). The CUDA driver library is different than the other libraries in that it is tightly coupled to the NVIDIA driver and does not come with CUDA it self, but the NVIDIA drivers.

Dynamic library dependencies are resolved by the dynamic linker. The dynamic linker uses a cache of known libraries. The directories that are cached can be configured using /etc/ld.so.conf. In addition to that, an ELF binary can specify additional library paths, the so-called _runtime path) or rpath. However, rpath is not used in our llama-cli binary:

$ patchelf --print-rpath llama-cli
# Nothingness

So everything library is loaded from directories configured in ld.so.conf. If you want to get more details on dynamic linking and how libraries are looked up, you can use the LD_DEBUG environment variable to get the dynamic linker to display library search paths:

$ LD_DEBUG=libs ./llama-cli
      4002:	find library=libcuda.so.1 [0]; searching
      4002:	 search cache=/etc/ld.so.cache
      4002:	  trying file=/lib/x86_64-linux-gnu/libcuda.so.1
      4002:
      4002:	find library=libcublas.so.12 [0]; searching
      4002:	 search cache=/etc/ld.so.cache
      4002:	  trying file=/usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.12
[...]

Dynamic linking in Nix

The standard dynamic linking approach is not compatible with the objectives of Nix. Nix aims for full reproducibility, which is not possible with a global dynamic linker cache.

Suppose that we have two applications, both are built against OpenBLAS (same library, same version), but with different OpenBLAS build configurations. With a global dynamic linker cache, we cannot distinguish both builds and ensure that the applications are dynamically linked against the correct build. So, we cannot fully reproduce the intended configurations.

To resolve this issue, Nix avoids using a global cache for dynamic linking. Instead, it embeds the paths of the library dependencies in the binary’s runtime path (rpath). We can observe this by e.g. building the llama.cpp derivation from the nixpkgs repository and inspecting the required libraries and the rpath:

$ export OUT=`nix-build -E '(import ./default.nix { config = { allowUnfree = true; cudaSupport = true; }; }).llama-cpp'`
$ patchelf --print-needed $OUT/lib/libggml.so | grep cu
libcudart.so.12
libcublas.so.12
libcublasLt.so.12
libcuda.so.1
$ patchelf --print-rpath $OUT/lib/libggml.so
/run/opengl-driver/lib:/nix/store/23j56hv7plgkgmhj8l2aj4mgjk32529h-cuda_cudart-12.2.140-lib/lib:/nix/store/9q0rrjr5y5ibqcxc9q1m34g1hb7z9yr8-cuda_cudart-12.2.140-stubs/lib:/nix/store/rnyc2acy5c45pi905ic9cb2iybn35crz-libcublas-12.2.5.6-lib/lib:/nix/store/0wydilnf1c9vznywsvxqnaing4wraaxp-glibc-2.39-52/lib:/nix/store/kgmfgzb90h658xg0i7mxh9wgyx0nrqac-gcc-13.3.0-lib/lib

As you can see, rather than just storing the required dynamic libraries and letting the dynamic linker resolve their full paths from its cache, a binary compiled with Nix embeds the full paths of its library dependencies in the Nix Store (/nix/store).

This solves the reproducibility issue, since each binary/library can fully specify the version it uses, and e.g. different build configurations of a binary will lead to different hashes in the output paths (/nix/store/<hash>-<name>-<version>-<output>).

The glitch in the matrix: the CUDA driver library

There is one glitch/impurity that creeps in. Remember that the CUDA driver library (libcuda.so.1) is tightly coupled to the NVIDIA driver? So, in the case of this particular library, we cannot dynamically link against arbitrary versions. It needs to link against the CUDA driver library that corresponds to the system’s NVIDIA driver.

NixOS solves this by allowing an impurity in the form of global state for this particular case. As can be seen in the rpath above, there is an entry /run/opengl-driver/lib. If the NVIDIA driver is configured on a NixOS system, NixOS guarantees that libcuda.so.1 is symlinked into this location. In this way, a binary will always use a CUDA driver library that is consistent with the system’s NVIDIA driver version.

Sadly, this doesn’t work on non-NixOS systems, because they don’t have the /run/opengl-driver/lib directory. This brings us to some hacks to resolve this issue…

Solutions

$ sudo mkdir -p /run/opengl-driver/lib
$ sudo find /usr/lib \
   . -name 'libcuda.so*' \
    -exec ln -s {} /run/opengl-driver/lib/ \;

Preload the driver library

$ export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libcuda.so.1

Warning

LD_PRELOAD does not cover all cases. When a program/library uses runtime compilation (e.g. Triton), the Nix derivation will typically burn the path /run/opengl-driver/lib into the package as a linker path (i.e. -L/run/opengl-driver/lib). LD_PRELOAD does not override this and will fail in such cases.

Warning

Avoid using LD_LIBRARY_PATH unless the CUDA driver library is in a directory by itself. Using LD_LIBRARY_PATH with a path with multiple libraries can also override other libraries. In the best case, this breaks reproducibility. In the worst case it breaks the application.

nixGL

nixGL can wrap a program to resolve the CUDA driver library.