Description
Local language models can be served over a server process with GPU or NPU acceleration. It is useful for developers who want local AI inference exposed to tools or applications on the same machine or trusted network.
LLM servers can process private prompts and may expose an API. Restrict listening interfaces, review model sources, and avoid sending sensitive data to untrusted clients.