You got me fired up again about self hosted llms for coding @skalyan .
I have a GTS 970 in a machine with 32G of RAM. I installed Ubuntu 24.04 and ollama. I had to fiddle with ollama a bit to get it listening on 0.0.0.0 instead of 127.0.0.1.
I then tried opencode on another machine. I had a chat with copilot about which environment variables to set to tell opencode to connect to my ollama instance.
I asked it a question about some code of mine, and btop seemed to confirm that i was putting a load on the GPU in that machine, but it didn’t get into a back and forth that yielded a result.
Did i misunderstand your roadmap? Last night i installed qwen code, but when you said opencode i assumed that was the better way to go.
Curious about this workflow too! I found a MLX version of Qwen 3.6 and gave it a brief spin on the M4 MBA but also want to get one of the Linux workstations going at some stage to try this out properly.
Will point out here that nvtop is also fantastic for monitoring what an Nvidia GPU is doing.
Actually, i need to revise that statement. It turns our that DeepSeek V4 Flash Free" isn't locally hosted. OpenCode was still defaulting to a cloud hosted service. It was good, but i still wanted something locally hosted. I started watching btop` and realised my GPU wasn’t being used at all.
So from there i looked a little at qwen3-coder. Nowhere near as responsive, and not usable yet because i hit a token limit problem. I had to edit my systemctl configuration for ollama to set OLLAMA_CONTEXT_LENGTH=32768. I was observing that i’d ask opencode / qwen3-coder to tackle a problem, it would do a bunch of “thinking” and then just stop. I told ChatGPT what i was observing and it explained that the /v1 OpenAI API that OpenCode uses with Ollama does not allow opencode to adjust the context size. It was defaulting to 2048 tokens, which is barely enough for it to comprehend the question.
Anyway, my GPU is doing more than it’s done in the last ten years!