I've been running an ollama and deepseek in a container in TrueNAS k8s for several months. It's hooked up to my Continue extension in VSCode. I also mix it with cloud hosted "dumb" ones for other tasks like code completion. Ollama deepseek is reserved for heavier chat and code tasks.
It's fast as hell. Though you will need at least two GPUs to divide between ollama and if need something else(display/game/proxmox) to use it.
It's fast as hell. Though you will need at least two GPUs to divide between ollama and if need something else(display/game/proxmox) to use it.