Description
NVIDIA GPUs can accelerate local LLM inference through the CUDA build of llama.cpp.
This package is useful for users with compatible NVIDIA hardware who want faster model execution. It provides optimized inference tools; it does not include model files.
CUDA inference can use large GPU memory and may fail with mismatched drivers. Check hardware support before installing.