Description
Local language models can use Vulkan-capable graphics hardware for AI inference. This helps users try GPU acceleration on systems where Vulkan is the preferred backend.
It is an Ollama binary variant. Performance and compatibility depend on the GPU, drivers, model size, and available memory.