vllm.entrypoints.serve.instrumentator.health ¶
health async ¶
Readiness probe. Returns 503 during shutdown/drain so that load balancers stop sending new traffic.
Source code in vllm/entrypoints/serve/instrumentator/health.py
live async ¶
Liveness probe. Returns 200 as long as the process is alive, even during graceful shutdown/drain. Only returns 503 when the engine has encountered a fatal error.