Description
Transformer models can run inference efficiently from C++ applications and services. This is useful for translation, speech, and other machine-learning workloads where deployment speed and resource use matter.
It is a development library rather than a ready-made chat or translation app. Users normally need model files and application code that knows how to call the library.