VLM agent in Isaac Sim

Vision-language reasoning over a simulated robotics scene.

About this demo

A vision-language model observes the Isaac Sim scene via screenshots and reasons over what to do next ('clear the spilled bottle from aisle 3'). Bridges high-level reasoning with low-level skill execution.

Highlights

→Isaac Sim integration
→VLM-as-planner pattern
→Skill library composition
→Sim-to-real transferable plans

Supported robots

Simulated mobile manipulator

Related demos

View all →

VLA

GR00T VLA pick-and-place

Language-conditioned manipulation with NVIDIA GR00T N1.5.

Open demo →VLA

OpenVLA grasping

Open-source VLA model for manipulation — no proprietary checkpoints.

Open demo →Teleop

CockPit

Predefined teleop dashboard with dual cameras, 3D model, map, and controls.

Open demo →