$500[ish] AI Build Challenge

I finally got this machine working again this morning after starting over yesterday using NixOS instead of Debian 12. Thanks to @techman for the help in his thread about his AI build last night. It turns out to not have been a Linux kernel issue, but a CUDA kernel issue. I’ll give my configuration.nix a tidy up and paste it in the NixOS thread later today, but will document the AI build relevant stuff here.

I started from scratch this morning and found that after getting the card going in NixOS without issue, I was running into an issue which only popped up a few weeks ago where NixOS only compiles ollama CUDA support for compute capability 7.5 and above.

As I was tracing things through step by step, -- Using CUDA architectures: 75;80;86;89;90;100;120 stood out in the build logs as the card I’m using is Pascal based (6.1) and absolutely supported by ollama.

I didn’t end up applying the fixes proposed in the GitHub issue, but they put me on the right track. I decided to investigate the concept of NixOS overrides a bit more instead and try to come up with my own fix that I could build into my configuration.nix and save for the future. All my cards I’m experimenting with are older cards based on Pascal 6.1 so I was fine hardcoding this fix and having it commented in the configuration.nix file for me to change if needed in the future.

Given that I only started using NixOS yesterday, my very limited understanding is that the cudaArches variable is being pulled/generated from somewhere and used during the build of the ollama package. Not entirely sure where cudaArches coming from yet - it’s a curiosity for me to explore later! However, the following lines in configuration.nix override the cudaArches variable being used by the CUDA compiler when ollama is being built, and sets it to 61 (for a Pascal / version compute 6.1 based card).

   nixpkgs.config.packageOverrides = pkgs:
       { ollama = pkgs.ollama.override {
          cudaArches = [ "61" ];
         };
       };

One sudo nixos-rebuild switch later, the following pops up during the compile:

-- CUDA Toolkit found
-- Using CUDA architectures: 61
-- The CUDA compiler identification is NVIDIA 12.8.93 with host compiler GNU 14.2.1

Woo!

After that, everything just worked.

Using the now standard “Why is the sky blue?” test on the deepseek-r1:14b model I’ve been using throughout the thread, I get a prompt eval rate of 18.33t/s and eval rate of 5.93t/s. This is a 1.3% and 6.0% increase on the numbers from the Debian 12 based run back on June 8. Probably not statistically significant, and a lot of variables changed (e.g., this used a newer version of ollama 0.9.6 and the deepseek-r1:14b model), so I’m not going to make a claim that NixOS is “faster”, but it’s certainly seems to be harnessing the card’s capabilities in roughly the same way.

This post was more about me dipping my toe into the world of NixOS than it was about the AI build, but I thought I’d post an update with the test results, and also document this somewhere in case someone happens to be following along and looking at using a surplus datacenter card such as the P4.

1 Like