Description
Saves and restores running program state for checkpointing workflows, including distributed and multithreaded applications. It can help long-running jobs survive planned maintenance, migration, or experiments.
Use it in controlled computing environments. Not every program can be checkpointed safely, especially when network connections, hardware devices, or external state are involved.