vLLM

@vllm_project

NVIDIA published a tutorial for deploying Cosmos Reason 2B on Jetson using vLLM — covering AGX Thor, AGX Orin, and Orin Super Nano. FP8 quantized VLM with chain-of-thought reasoning, served via `vllm serve` and connected to a real-time webcam UI for interactive vision analysis. Great to see vLLM powering edge inference on Jetson. 🙏 Thanks to the @NVIDIARobotics Jetson team! 🔗 https://huggingface.co/blog/nvidia/cosmos-on-jetson

huggingface.co